You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Philippe Lamarche <ph...@gmail.com> on 2008/11/02 16:29:57 UTC

Re: Problems with KMeans clustering

Hi there,
It also works on 0.19.0-dev, that is on hadoop/branches/branch-0.19.

I intend in the next few day to try to find out what exactly is the problem
to make sure that it won't come back in a few revisions.

Thanks!

On Thu, Oct 30, 2008 at 9:20 AM, Grant Ingersoll <gs...@apache.org>wrote:

> Hmm, I believe that patch has been applied in 18.2 (whatever that is) but
> it also looks like it has been applied to 0.17.3 branch as well.    So, it
> might be something else that "fixed" it.
>
> At any rate, glad to hear it works on trunk.
>
>
> On Oct 29, 2008, at 6:38 PM, Philippe Lamarche wrote:
>
>  I am not sure I understand the hadoop svn structure, however I was able to
>> make it work with hadoop trunk, or 0.20.0-dev.
>> It didn't work with hadoop/branch-0.18, with or without patch 4277.
>>
>>
>> Here is a copy-paste of the steps, once Hadoop is built and installed.  I
>> am
>> using the same exact "apache-mahout-examples-0.1-dev.job", not rebuilt
>> with
>> the 0.20.0-dev jars.
>>
>> It works!
>>
>> That would mean that the bug/feature is not related to
>> HADOOP-4277<http://issues.apache.org/jira/browse/HADOOP-4277>,
>>
>> and was reintroduced (or never took away) in hadoop/trunk.
>>
>>
>> hadoop@phil:/usr/local/hadoop$ bin/hadoop namenode -format
>> 08/10/29 18:27:59 INFO namenode.NameNode: STARTUP_MSG:
>> /************************************************************
>> STARTUP_MSG: Starting NameNode
>> STARTUP_MSG:   host = phil/127.0.1.1
>> STARTUP_MSG:   args = [-format]
>> STARTUP_MSG:   version = 0.20.0-dev
>> STARTUP_MSG:   build =  -r ; compiled by 'philippe' on Wed Oct 29 18:25:08
>> EDT 2008
>> ************************************************************/
>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: supergroup=supergroup
>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: isPermissionEnabled=true
>> 08/10/29 18:28:00 INFO common.Storage: Image file of size 96 saved in 0
>> seconds.
>> 08/10/29 18:28:00 INFO common.Storage: Storage directory
>> /usr/local/hadoop-datastore/hadoop-hadoop/dfs/name has been successfully
>> formatted.
>> 08/10/29 18:28:00 INFO namenode.NameNode: SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down NameNode at phil/127.0.1.1
>> ************************************************************/
>>
>> hadoop@phil:/usr/local/hadoop$ bin/hadoop dfs -put
>> /home/philippe/synthetic_control.data testdata
>>
>> hadoop@phil:/usr/local/hadoop$ bin/hadoop jar
>>
>> /home/philippe/workspace/MahoutJava/examples/build/apache-mahout-examples-0.1-dev.job
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>> 08/10/29 18:28:45 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/29 18:28:46 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 1
>> 08/10/29 18:28:47 INFO mapred.JobClient: Running job:
>> job_200810291828_0002
>> 08/10/29 18:28:48 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/29 18:28:54 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/29 18:28:55 INFO mapred.JobClient:  map 100% reduce 0%
>> 08/10/29 18:28:56 INFO mapred.JobClient: Job complete:
>> job_200810291828_0002
>> 08/10/29 18:28:56 INFO mapred.JobClient: Counters: 7
>> 08/10/29 18:28:56 INFO mapred.JobClient:   File Systems
>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes read=291644
>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes written=323660
>> 08/10/29 18:28:56 INFO mapred.JobClient:   Job Counters
>> 08/10/29 18:28:56 INFO mapred.JobClient:     Launched map tasks=2
>> 08/10/29 18:28:56 INFO mapred.JobClient:     Data-local map tasks=2
>> 08/10/29 18:28:56 INFO mapred.JobClient:   Map-Reduce Framework
>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input records=600
>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input bytes=288374
>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map output records=600
>> 08/10/29 18:28:56 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/29 18:28:56 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/29 18:28:56 INFO mapred.JobClient: Running job:
>> job_200810291828_0003
>> 08/10/29 18:28:57 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/29 18:29:03 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/29 18:29:05 INFO mapred.JobClient:  map 100% reduce 0%
>> 08/10/29 18:29:10 INFO mapred.JobClient:  map 100% reduce 100%
>> 08/10/29 18:29:11 INFO mapred.JobClient: Job complete:
>> job_200810291828_0003
>> 08/10/29 18:29:11 INFO mapred.JobClient: Counters: 16
>> 08/10/29 18:29:11 INFO mapred.JobClient:   File Systems
>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes read=323660
>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes written=9657
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes read=36119
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes written=72300
>> 08/10/29 18:29:11 INFO mapred.JobClient:   Job Counters
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched reduce tasks=1
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched map tasks=2
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Data-local map tasks=2
>> 08/10/29 18:29:11 INFO mapred.JobClient:   Map-Reduce Framework
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input groups=1
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine output records=28
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input records=600
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce output records=7
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output bytes=943020
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input bytes=323660
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine input records=1732
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output records=1732
>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input records=28
>> 08/10/29 18:29:11 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/29 18:29:11 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/29 18:29:12 INFO mapred.JobClient: Running job:
>> job_200810291828_0004
>> 08/10/29 18:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/29 18:29:20 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/29 18:29:22 INFO mapred.JobClient:  map 100% reduce 0%
>> 08/10/29 18:29:27 INFO mapred.JobClient:  map 100% reduce 100%
>> 08/10/29 18:29:28 INFO mapred.JobClient: Job complete:
>> job_200810291828_0004
>> 08/10/29 18:29:28 INFO mapred.JobClient: Counters: 16
>> 08/10/29 18:29:28 INFO mapred.JobClient:   File Systems
>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes read=342974
>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes written=3002539
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes read=3018455
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes written=6036972
>> 08/10/29 18:29:28 INFO mapred.JobClient:   Job Counters
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched reduce tasks=1
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched map tasks=2
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Data-local map tasks=2
>> 08/10/29 18:29:28 INFO mapred.JobClient:   Map-Reduce Framework
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input groups=7
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine output records=0
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input records=600
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce output records=1591
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output bytes=3008903
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input bytes=323660
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine input records=0
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output records=1591
>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input records=1591
>> 08/10/29 18:29:28 INFO kmeans.KMeansDriver: Iteration 0
>> 08/10/29 18:29:28 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/29 18:29:28 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/29 18:29:28 INFO mapred.JobClient: Running job:
>> job_200810291828_0005
>> 08/10/29 18:29:29 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/29 18:29:35 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/29 18:29:37 INFO mapred.JobClient:  map 100% reduce 0%
>> 08/10/29 18:29:41 INFO mapred.JobClient: Job complete:
>> job_200810291828_0005
>> 08/10/29 18:29:41 INFO mapred.JobClient: Counters: 16
>> 08/10/29 18:29:41 INFO mapred.JobClient:   File Systems
>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes read=342974
>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes written=8205
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes read=23227
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes written=46516
>> 08/10/29 18:29:41 INFO mapred.JobClient:   Job Counters
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched reduce tasks=1
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched map tasks=2
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Data-local map tasks=2
>> 08/10/29 18:29:41 INFO mapred.JobClient:   Map-Reduce Framework
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input groups=7
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine output records=10
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input records=600
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce output records=7
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output bytes=1136504
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input bytes=323660
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine input records=600
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output records=600
>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input records=10
>> 08/10/29 18:29:41 INFO kmeans.KMeansDriver: Iteration 1
>> 08/10/29 18:29:41 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/29 18:29:41 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/29 18:29:42 INFO mapred.JobClient: Running job:
>> job_200810291828_0006
>> 08/10/29 18:29:43 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/29 18:29:50 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/29 18:29:51 INFO mapred.JobClient:  map 100% reduce 0%
>> 08/10/29 18:29:55 INFO mapred.JobClient:  map 100% reduce 100%
>> 08/10/29 18:29:56 INFO mapred.JobClient: Job complete:
>> job_200810291828_0006
>> 08/10/29 18:29:56 INFO mapred.JobClient: Counters: 16
>> 08/10/29 18:29:56 INFO mapred.JobClient:   File Systems
>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes read=340070
>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes written=8242
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes read=21265
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes written=42592
>> 08/10/29 18:29:56 INFO mapred.JobClient:   Job Counters
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched reduce tasks=1
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched map tasks=2
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Data-local map tasks=2
>> 08/10/29 18:29:56 INFO mapred.JobClient:   Map-Reduce Framework
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input groups=7
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine output records=10
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input records=600
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce output records=7
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output bytes=1023966
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input bytes=323660
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine input records=600
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output records=600
>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input records=10
>> 08/10/29 18:29:56 INFO kmeans.KMeansDriver: Iteration 2
>> 08/10/29 18:29:56 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/29 18:29:56 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/29 18:29:56 INFO mapred.JobClient: Running job:
>> job_200810291828_0007
>> 08/10/29 18:29:57 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/29 18:30:03 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/29 18:30:05 INFO mapred.JobClient:  map 100% reduce 0%
>> 08/10/29 18:30:09 INFO mapred.JobClient: Job complete:
>> job_200810291828_0007
>> 08/10/29 18:30:09 INFO mapred.JobClient: Counters: 16
>> 08/10/29 18:30:09 INFO mapred.JobClient:   File Systems
>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes read=340144
>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes written=8280
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes read=21085
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes written=42232
>> 08/10/29 18:30:09 INFO mapred.JobClient:   Job Counters
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched reduce tasks=1
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched map tasks=2
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Data-local map tasks=2
>> 08/10/29 18:30:09 INFO mapred.JobClient:   Map-Reduce Framework
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input groups=7
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine output records=10
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input records=600
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce output records=7
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output bytes=1023681
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input bytes=323660
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine input records=600
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output records=600
>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input records=10
>> 08/10/29 18:30:09 INFO kmeans.KMeansDriver: Iteration 3
>> 08/10/29 18:30:09 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/29 18:30:09 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/29 18:30:09 INFO mapred.JobClient: Running job:
>> job_200810291828_0008
>> 08/10/29 18:30:10 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/29 18:30:17 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/29 18:30:18 INFO mapred.JobClient:  map 100% reduce 0%
>> 08/10/29 18:30:22 INFO mapred.JobClient:  map 100% reduce 100%
>> 08/10/29 18:30:23 INFO mapred.JobClient: Job complete:
>> job_200810291828_0008
>> 08/10/29 18:30:23 INFO mapred.JobClient: Counters: 16
>> 08/10/29 18:30:23 INFO mapred.JobClient:   File Systems
>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes read=340220
>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes written=8250
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes read=21339
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes written=42740
>> 08/10/29 18:30:23 INFO mapred.JobClient:   Job Counters
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched reduce tasks=1
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched map tasks=2
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Data-local map tasks=2
>> 08/10/29 18:30:23 INFO mapred.JobClient:   Map-Reduce Framework
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input groups=7
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine output records=10
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input records=600
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce output records=7
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output bytes=1028419
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input bytes=323660
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine input records=600
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output records=600
>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input records=10
>> 08/10/29 18:30:23 INFO kmeans.KMeansDriver: Iteration 4
>> 08/10/29 18:30:23 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/29 18:30:23 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/29 18:30:24 INFO mapred.JobClient: Running job:
>> job_200810291828_0009
>> 08/10/29 18:30:25 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/29 18:30:31 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/29 18:30:33 INFO mapred.JobClient:  map 100% reduce 0%
>> 08/10/29 18:30:37 INFO mapred.JobClient:  map 100% reduce 100%
>> 08/10/29 18:30:38 INFO mapred.JobClient: Job complete:
>> job_200810291828_0009
>> 08/10/29 18:30:38 INFO mapred.JobClient: Counters: 16
>> 08/10/29 18:30:38 INFO mapred.JobClient:   File Systems
>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes read=340160
>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes written=8200
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes read=21219
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes written=42500
>> 08/10/29 18:30:38 INFO mapred.JobClient:   Job Counters
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched reduce tasks=1
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched map tasks=2
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Data-local map tasks=2
>> 08/10/29 18:30:38 INFO mapred.JobClient:   Map-Reduce Framework
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input groups=7
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine output records=10
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input records=600
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce output records=7
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output bytes=1024899
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input bytes=323660
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine input records=600
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output records=600
>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input records=10
>> 08/10/29 18:30:38 INFO kmeans.KMeansDriver: Clustering
>> 08/10/29 18:30:38 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/29 18:30:38 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/29 18:30:38 INFO mapred.JobClient: Running job:
>> job_200810291828_0010
>> 08/10/29 18:30:39 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/29 18:30:45 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/29 18:30:47 INFO mapred.JobClient: Job complete:
>> job_200810291828_0010
>> 08/10/29 18:30:47 INFO mapred.JobClient: Counters: 7
>> 08/10/29 18:30:47 INFO mapred.JobClient:   File Systems
>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes read=340060
>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes written=1020535
>> 08/10/29 18:30:47 INFO mapred.JobClient:   Job Counters
>> 08/10/29 18:30:47 INFO mapred.JobClient:     Launched map tasks=2
>> 08/10/29 18:30:47 INFO mapred.JobClient:     Data-local map tasks=2
>> 08/10/29 18:30:47 INFO mapred.JobClient:   Map-Reduce Framework
>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input records=600
>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input bytes=323660
>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map output records=600
>> 08/10/29 18:30:47 WARN mapred.JobClient: Use GenericOptionsParser for
>> parsing the arguments. Applications should implement Tool for the same.
>> 08/10/29 18:30:47 INFO mapred.FileInputFormat: Total input paths to
>> process
>> : 2
>> 08/10/29 18:30:48 INFO mapred.JobClient: Running job:
>> job_200810291828_0011
>> 08/10/29 18:30:49 INFO mapred.JobClient:  map 0% reduce 0%
>> 08/10/29 18:30:56 INFO mapred.JobClient:  map 50% reduce 0%
>> 08/10/29 18:30:57 INFO mapred.JobClient: Job complete:
>> job_200810291828_0011
>> 08/10/29 18:30:57 INFO mapred.JobClient: Counters: 7
>> 08/10/29 18:30:57 INFO mapred.JobClient:   File Systems
>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes read=1020535
>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes written=325460
>> 08/10/29 18:30:57 INFO mapred.JobClient:   Job Counters
>> 08/10/29 18:30:57 INFO mapred.JobClient:     Launched map tasks=2
>> 08/10/29 18:30:57 INFO mapred.JobClient:     Data-local map tasks=2
>> 08/10/29 18:30:57 INFO mapred.JobClient:   Map-Reduce Framework
>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input records=600
>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input bytes=1020535
>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map output records=600
>>
>>
>>
>>
>>
>> On Wed, Oct 29, 2008 at 11:10 AM, Philippe Lamarche <
>> philippe.lamarche@gmail.com> wrote:
>>
>>  I will!
>>>
>>>
>>> On 10/29/08, Grant Ingersoll <gs...@apache.org> wrote:
>>>
>>>>
>>>> Philippe, can you try the patch suggested by Arun Murthy on
>>>> core-user@hadoop.a.o?  See
>>>> http://issues.apache.org/jira/browse/HADOOP-4277
>>>>
>>>> I'm pretty swamped at the moment w/ ApacheCon coming up next week, but
>>>> if
>>>> it does fix the issue, then maybe we should move forward to the 18.2
>>>> candidate (I don't think it has been released yet, those guys have a
>>>> pretty
>>>> sophisticated build process going)
>>>>
>>>> -Grant
>>>>
>>>> On Oct 28, 2008, at 7:19 AM, Philippe Lamarche wrote:
>>>>
>>>> Ubuntu linux 2.6.24 <http://2.6.24.21>, with java-6-sun-1.6.0.07.
>>>>
>>>>>
>>>>> On Tue, Oct 28, 2008 at 7:03 AM, Grant Ingersoll <gsingers@apache.org
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>
>>>>> Just a single machine.  I didn't think we were using features either.
>>>>>
>>>>>> Are
>>>>>> you saying you can run the example using 0.18.1?
>>>>>>
>>>>>> BTW, Philippe, what JVM, O/S, etc. are you using?
>>>>>>
>>>>>> -Grant
>>>>>>
>>>>>>
>>>>>> On Oct 27, 2008, at 11:55 PM, Jeff Eastman wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>>> Are you guys running on real Hadoop arrays? I can run the synthetic
>>>>>>> control example just fine on a single machine. That code is just
>>>>>>> trying
>>>>>>> to
>>>>>>> read a vector from a string. I'd be surprised if we were using any
>>>>>>> "features" but will watch the threads.
>>>>>>>
>>>>>>> Jeff
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Grant Ingersoll wrote:
>>>>>>>
>>>>>>> I started a thread on core-user@hadoop.a.o:
>>>>>>>
>>>>>>>> http://hadoop.markmail.org/message/cczunzfhpcqz6pis
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 27, 2008, at 9:49 PM, Grant Ingersoll wrote:
>>>>>>>>
>>>>>>>> OK, I can confirm that the exact same code works with 0.17.2 and not
>>>>>>>> w/
>>>>>>>>
>>>>>>>>  0.18.1.  So, it sounds like a bug in Hadoop, or we are relying on
>>>>>>>>> incorrect behavior in Hadoop.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 27, 2008, at 9:33 PM, Grant Ingersoll wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Unfortunately, I went straight from 0.17.2 to 0.18.1.  It was
>>>>>>>>>> working
>>>>>>>>>>
>>>>>>>>>>  on
>>>>>>>>>>> 0.17.2.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> BTW, are you saying the same exact code was working on 0.17.2 or
>>>>>>>>>>>
>>>>>>>>>> are
>>>>>>>>>> you referring to some older Mahout code that worked on 17.2?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <
>>>>>>>>>>> gsingers@apache.org
>>>>>>>>>>>
>>>>>>>>>>>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Did this work with 0.18.0 or other prior versions for you?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  I just updated to hadoop 0.18.1 and got a clean version of
>>>>>>>>>>>>> mahout
>>>>>>>>>>>>> from
>>>>>>>>>>>>> svn.
>>>>>>>>>>>>> However, I am having problems with KMeans, that can be traced
>>>>>>>>>>>>> down
>>>>>>>>>>>>> to :
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>> Merging
>>>>>>>>>>>>> 2 sorted segments
>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>> Down
>>>>>>>>>>>>> to
>>>>>>>>>>>>> the last merge-pass, with 2 segments left of total size: 5011
>>>>>>>>>>>>> bytes
>>>>>>>>>>>>> 2008-10-25 19:10:16,999 WARN
>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0 Merge of the inmemory
>>>>>>>>>>>>> files
>>>>>>>>>>>>> threw
>>>>>>>>>>>>> an exception: java.io.IOException: Intermedate merge failed
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2078)
>>>>>>>>>>>>> Caused by: java.lang.NumberFormatException: For input string:
>>>>>>>>>>>>> "["
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
>>>>>>>>>>>>> at java.lang.Double.parseDouble(Double.java:510)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.combineAndSpill(ReduceTask.java:2174)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$3100(ReduceTask.java:341)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)
>>>>>>>>>>>>> ... 1 more
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2008-10-25 19:10:16,999 INFO
>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>> In-memory merge complete: 0 files left.
>>>>>>>>>>>>> 2008-10-25 19:10:17,000 WARN
>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>>>>> Error running child
>>>>>>>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>> reduce
>>>>>>>>>>>>> copier failed
>>>>>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is while running the synthetic_control.data example, but I
>>>>>>>>>>>>> have
>>>>>>>>>>>>> the
>>>>>>>>>>>>> same problems with any other input data.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am able to do other map-reduce job without problems.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here is the output of the jar task:
>>>>>>>>>>>>>
>>>>>>>>>>>>> hadoop@philippe-vaio:/usr/local/hadoop$ bin/hadoop jar
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> /home/philippe/workspace/MahoutJava/examples/dist/apache-mahout-examples-0.1-dev.jar
>>>>>>>>>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>>>>>>>>>>> 08/10/25 19:09:27 WARN mapred.JobClient: Use
>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>> for
>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>> the
>>>>>>>>>>>>> same.
>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>> paths
>>>>>>>>>>>>> to
>>>>>>>>>>>>> process
>>>>>>>>>>>>> : 1
>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>> paths
>>>>>>>>>>>>> to
>>>>>>>>>>>>> process
>>>>>>>>>>>>> : 1
>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>> 08/10/25 19:09:29 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>> 08/10/25 19:09:31 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>> read=291644
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>> written=323660
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Launched map
>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Data-local map
>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>> records=600
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>> bytes=288374
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>> records=600
>>>>>>>>>>>>> 08/10/25 19:09:32 WARN mapred.JobClient: Use
>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>> for
>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>> the
>>>>>>>>>>>>> same.
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>> paths
>>>>>>>>>>>>> to
>>>>>>>>>>>>> process
>>>>>>>>>>>>> : 2
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>> paths
>>>>>>>>>>>>> to
>>>>>>>>>>>>> process
>>>>>>>>>>>>> : 2
>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>> 08/10/25 19:09:33 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>> 08/10/25 19:09:37 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>> 08/10/25 19:09:39 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>>>>>>> 08/10/25 19:09:44 INFO mapred.JobClient:  map 100% reduce 16%
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>> read=323660
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>> written=1447
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>> read=1389
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>> written=37878
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched reduce
>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched map
>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Data-local map
>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine output
>>>>>>>>>>>>> records=29
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>> records=600
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce output
>>>>>>>>>>>>> records=1
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>> bytes=943020
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine input
>>>>>>>>>>>>> records=1760
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>> records=1732
>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>> records=1
>>>>>>>>>>>>> 08/10/25 19:09:53 WARN mapred.JobClient: Use
>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>> for
>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>> the
>>>>>>>>>>>>> same.
>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>> paths
>>>>>>>>>>>>> to
>>>>>>>>>>>>> process
>>>>>>>>>>>>> : 2
>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>> paths
>>>>>>>>>>>>> to
>>>>>>>>>>>>> process
>>>>>>>>>>>>> : 2
>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>> 08/10/25 19:09:54 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>> 08/10/25 19:09:56 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>> 08/10/25 19:09:58 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>> read=326554
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>> written=1137260
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>> read=1147358
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>> written=2304490
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched reduce
>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched map
>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Data-local map
>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine output
>>>>>>>>>>>>> records=0
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>> records=600
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce output
>>>>>>>>>>>>> records=600
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>> bytes=1139660
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine input
>>>>>>>>>>>>> records=0
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>> records=600
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>> records=600
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0
>>>>>>>>>>>>> 08/10/25 19:10:02 WARN mapred.JobClient: Use
>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>> for
>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>> the
>>>>>>>>>>>>> same.
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>> paths
>>>>>>>>>>>>> to
>>>>>>>>>>>>> process
>>>>>>>>>>>>> : 2
>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>> paths
>>>>>>>>>>>>> to
>>>>>>>>>>>>> process
>>>>>>>>>>>>> : 2
>>>>>>>>>>>>> 08/10/25 19:10:03 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>> job_200810251826_0013
>>>>>>>>>>>>> 08/10/25 19:10:04 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>> 08/10/25 19:10:08 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>> 08/10/25 19:10:09 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>>>>>>> 08/10/25 19:10:21 INFO mapred.JobClient: Task Id :
>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0, Status : FAILED
>>>>>>>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>> reduce
>>>>>>>>>>>>> copier
>>>>>>>>>>>>> failed
>>>>>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>> at
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am not sure if I am doing something wrong here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Philippe.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
>>>>>>>>>>>> Orleans.
>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>
>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
>>>>>>>>>> Orleans.
>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------------------------
>>>>>>>>>>
>>>>>>>>> Grant Ingersoll
>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------
>>>>>>>>>
>>>>>>>> Grant Ingersoll
>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>
>>>>>>>>
>>>>>>>> Lucene Helpful Hints:
>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --------------------------
>>>>>>>
>>>>>> Grant Ingersoll
>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>>> http://www.lucenebootcamp.com
>>>>>>
>>>>>>
>>>>>> Lucene Helpful Hints:
>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  --------------------------
>>>> Grant Ingersoll
>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>> http://www.lucenebootcamp.com
>>>>
>>>>
>>>> Lucene Helpful Hints:
>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
> --------------------------
> Grant Ingersoll
> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
> http://www.lucenebootcamp.com
>
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>

Re: Problems with KMeans clustering

Posted by Grant Ingersoll <gs...@apache.org>.
I think that would be good.  I'm going to be working on MAHOUT today  
and tomorrow, hopefully.  Finally have some free time at ApacheCon...

On Nov 5, 2008, at 11:08 PM, Palleti, Pallavi wrote:

> Hi all,
>
> The same is discussed here:
> https://issues.apache.org/jira/browse/MAHOUT-79
>
> I have patch for fixing this issue ready. If no one is working on  
> it, I
> can open an issue in jira and commit the same.
>
> Thanks
> Pallavi
>
> -----Original Message-----
> From: Jeff Eastman [mailto:jdog@windwardsolutions.com]
> Sent: Thursday, November 06, 2008 5:50 AM
> To: mahout-user@lucene.apache.org
> Subject: Re: Problems with KMeans clustering
>
> Thanks Steve,
>
> That was a subtle change that was evidently made after Kmeans was
> implemented and did not show up until later when people such as  
> Philippe
>
> and yourself ran it with real problems on real clusters. While the  
> type
> signatures of the reducer and combiner are in fact the same, the  
> values
> provided by the mapper and combiner are different and could indeed
> create the odd behavior that was reported.
>
> The algorithm's dependence upon run-once behavior is pretty  
> fundamental,
>
> since summing of cluster centroids is done in the combiner and the
> reducer does a merge of those clusters. I'd be interested in exactly  
> how
>
> you resolved this.
>
> It likely applies to some of the other clustering implementations too.
>
> Finally, can you explain why this problem no longer seems to occur  
> with
> Hadoop trunk?
>
> Jeff
>
>
> Steve Schlosser wrote:
>> Hi folks
>>
>> A while back we upgraded our Hadoop cluster from 0.15 to 0.18.0,  
>> and I
>> found that Mahout Kmeans quit working.  I finally tracked it down to
>> the fact that the semantics of the combiner changed between 0.16,
>> 0.17, and 0.18 from run exactly once to run zero or more times (which
>> is in line with how Map/Reduce was originally specified).  See:
>> https://issues.apache.org/jira/browse/HADOOP-3586.
>>
>> The Kmeans combiner depended on running exactly once, but on our new
>> cluster it was running multiple times, causing hard-to-discern  
>> errors.
>> Basically, the second time through the Combiner, it would throw an
>> exception that the formatting of the vector (serialized into a Text)
>> was failing.  In the end, I had to make some formatting changes to  
>> the
>> data output by the Mapper and the Combiner to match what the Reducer
>> expects, as well as changes to the Combiner input to .  I ended up
>> having to hack the Mapper to output vectors that either the Combiner
>> or Reducer could take as input, and make the Combiner take in the  
>> same
>> input that it outputs and to calculate convergence at each step.
>>
>> My apologies if this has already been covered and put to rest - I  
>> just
>> happened upon this thread this afternoon.
>>
>> -steve
>>
>> On Sun, Nov 2, 2008 at 10:29 AM, Philippe Lamarche
>> <ph...@gmail.com> wrote:
>>
>>> Hi there,
>>> It also works on 0.19.0-dev, that is on hadoop/branches/branch-0.19.
>>>
>>> I intend in the next few day to try to find out what exactly is the
> problem
>>> to make sure that it won't come back in a few revisions.
>>>
>>> Thanks!
>>>
>>> On Thu, Oct 30, 2008 at 9:20 AM, Grant Ingersoll
> <gs...@apache.org>wrote:
>>>
>>>
>>>> Hmm, I believe that patch has been applied in 18.2 (whatever that
> is) but
>>>> it also looks like it has been applied to 0.17.3 branch as well.
> So, it
>>>> might be something else that "fixed" it.
>>>>
>>>> At any rate, glad to hear it works on trunk.
>>>>
>>>>
>>>> On Oct 29, 2008, at 6:38 PM, Philippe Lamarche wrote:
>>>>
>>>> I am not sure I understand the hadoop svn structure, however I was
> able to
>>>>
>>>>> make it work with hadoop trunk, or 0.20.0-dev.
>>>>> It didn't work with hadoop/branch-0.18, with or without patch  
>>>>> 4277.
>>>>>
>>>>>
>>>>> Here is a copy-paste of the steps, once Hadoop is built and
> installed.  I
>>>>> am
>>>>> using the same exact "apache-mahout-examples-0.1-dev.job", not
> rebuilt
>>>>> with
>>>>> the 0.20.0-dev jars.
>>>>>
>>>>> It works!
>>>>>
>>>>> That would mean that the bug/feature is not related to
>>>>> HADOOP-4277<http://issues.apache.org/jira/browse/HADOOP-4277>,
>>>>>
>>>>> and was reintroduced (or never took away) in hadoop/trunk.
>>>>>
>>>>>
>>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop namenode -format
>>>>> 08/10/29 18:27:59 INFO namenode.NameNode: STARTUP_MSG:
>>>>> /************************************************************
>>>>> STARTUP_MSG: Starting NameNode
>>>>> STARTUP_MSG:   host = phil/127.0.1.1
>>>>> STARTUP_MSG:   args = [-format]
>>>>> STARTUP_MSG:   version = 0.20.0-dev
>>>>> STARTUP_MSG:   build =  -r ; compiled by 'philippe' on Wed Oct 29
> 18:25:08
>>>>> EDT 2008
>>>>> ************************************************************/
>>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem:  
>>>>> fsOwner=hadoop,hadoop
>>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem:  
>>>>> supergroup=supergroup
>>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem:
> isPermissionEnabled=true
>>>>> 08/10/29 18:28:00 INFO common.Storage: Image file of size 96 saved
> in 0
>>>>> seconds.
>>>>> 08/10/29 18:28:00 INFO common.Storage: Storage directory
>>>>> /usr/local/hadoop-datastore/hadoop-hadoop/dfs/name has been
> successfully
>>>>> formatted.
>>>>> 08/10/29 18:28:00 INFO namenode.NameNode: SHUTDOWN_MSG:
>>>>> /************************************************************
>>>>> SHUTDOWN_MSG: Shutting down NameNode at phil/127.0.1.1
>>>>> ************************************************************/
>>>>>
>>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop dfs -put
>>>>> /home/philippe/synthetic_control.data testdata
>>>>>
>>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop jar
>>>>>
>>>>>
> /home/philippe/workspace/MahoutJava/examples/build/apache-mahout- 
> example
> s-0.1-dev.job
>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>>> 08/10/29 18:28:45 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:28:46 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 1
>>>>> 08/10/29 18:28:47 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0002
>>>>> 08/10/29 18:28:48 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:28:54 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:28:55 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0002
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Counters: 7
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=291644
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes
> written=323660
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input  
>>>>> bytes=288374
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:28:56 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:28:56 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0003
>>>>> 08/10/29 18:28:57 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:29:03 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:29:05 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:29:10 INFO mapred.JobClient:  map 100% reduce 100%
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0003
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=323660
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes
> written=9657
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes  
>>>>> read=36119
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes
> written=72300
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input groups=1
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine output
> records=28
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce output
> records=7
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output
> bytes=943020
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine input
> records=1732
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output
> records=1732
>>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input
> records=28
>>>>> 08/10/29 18:29:11 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:29:11 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:29:12 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0004
>>>>> 08/10/29 18:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:29:20 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:29:22 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:29:27 INFO mapred.JobClient:  map 100% reduce 100%
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0004
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=342974
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes
> written=3002539
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes
> read=3018455
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes
> written=6036972
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input groups=7
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine output
> records=0
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce output
> records=1591
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output
> bytes=3008903
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine input
> records=0
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output
> records=1591
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input
> records=1591
>>>>> 08/10/29 18:29:28 INFO kmeans.KMeansDriver: Iteration 0
>>>>> 08/10/29 18:29:28 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:29:28 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0005
>>>>> 08/10/29 18:29:29 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:29:35 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:29:37 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0005
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=342974
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes
> written=8205
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes  
>>>>> read=23227
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes
> written=46516
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input groups=7
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine output
> records=10
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce output
> records=7
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output
> bytes=1136504
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine input
> records=600
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input
> records=10
>>>>> 08/10/29 18:29:41 INFO kmeans.KMeansDriver: Iteration 1
>>>>> 08/10/29 18:29:41 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:29:41 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:29:42 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0006
>>>>> 08/10/29 18:29:43 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:29:50 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:29:51 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:29:55 INFO mapred.JobClient:  map 100% reduce 100%
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0006
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=340070
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes
> written=8242
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes  
>>>>> read=21265
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes
> written=42592
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input groups=7
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine output
> records=10
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce output
> records=7
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output
> bytes=1023966
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine input
> records=600
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input
> records=10
>>>>> 08/10/29 18:29:56 INFO kmeans.KMeansDriver: Iteration 2
>>>>> 08/10/29 18:29:56 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:29:56 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0007
>>>>> 08/10/29 18:29:57 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:30:03 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:30:05 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0007
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=340144
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes
> written=8280
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes  
>>>>> read=21085
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes
> written=42232
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input groups=7
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine output
> records=10
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce output
> records=7
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output
> bytes=1023681
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine input
> records=600
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input
> records=10
>>>>> 08/10/29 18:30:09 INFO kmeans.KMeansDriver: Iteration 3
>>>>> 08/10/29 18:30:09 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:30:09 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0008
>>>>> 08/10/29 18:30:10 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:30:17 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:30:18 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:30:22 INFO mapred.JobClient:  map 100% reduce 100%
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0008
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=340220
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes
> written=8250
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes  
>>>>> read=21339
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes
> written=42740
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input groups=7
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine output
> records=10
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce output
> records=7
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output
> bytes=1028419
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine input
> records=600
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input
> records=10
>>>>> 08/10/29 18:30:23 INFO kmeans.KMeansDriver: Iteration 4
>>>>> 08/10/29 18:30:23 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:30:23 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:30:24 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0009
>>>>> 08/10/29 18:30:25 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:30:31 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:30:33 INFO mapred.JobClient:  map 100% reduce 0%
>>>>> 08/10/29 18:30:37 INFO mapred.JobClient:  map 100% reduce 100%
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0009
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Counters: 16
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=340160
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes
> written=8200
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes  
>>>>> read=21219
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes
> written=42500
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched reduce
> tasks=1
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input groups=7
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine output
> records=10
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce output
> records=7
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output
> bytes=1024899
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine input
> records=600
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input
> records=10
>>>>> 08/10/29 18:30:38 INFO kmeans.KMeansDriver: Clustering
>>>>> 08/10/29 18:30:38 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:30:38 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0010
>>>>> 08/10/29 18:30:39 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:30:45 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0010
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient: Counters: 7
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes  
>>>>> read=340060
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes
> written=1020535
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input  
>>>>> bytes=323660
>>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>> 08/10/29 18:30:47 WARN mapred.JobClient: Use GenericOptionsParser
> for
>>>>> parsing the arguments. Applications should implement Tool for the
> same.
>>>>> 08/10/29 18:30:47 INFO mapred.FileInputFormat: Total input paths  
>>>>> to
>>>>> process
>>>>> : 2
>>>>> 08/10/29 18:30:48 INFO mapred.JobClient: Running job:
>>>>> job_200810291828_0011
>>>>> 08/10/29 18:30:49 INFO mapred.JobClient:  map 0% reduce 0%
>>>>> 08/10/29 18:30:56 INFO mapred.JobClient:  map 50% reduce 0%
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient: Job complete:
>>>>> job_200810291828_0011
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient: Counters: 7
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   File Systems
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes
> read=1020535
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes
> written=325460
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Job Counters
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Launched map tasks=2
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Data-local map  
>>>>> tasks=2
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Map-Reduce Framework
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input records=600
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input
> bytes=1020535
>>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map output  
>>>>> records=600
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Oct 29, 2008 at 11:10 AM, Philippe Lamarche <
>>>>> philippe.lamarche@gmail.com> wrote:
>>>>>
>>>>> I will!
>>>>>
>>>>>> On 10/29/08, Grant Ingersoll <gs...@apache.org> wrote:
>>>>>>
>>>>>>
>>>>>>> Philippe, can you try the patch suggested by Arun Murthy on
>>>>>>> core-user@hadoop.a.o?  See
>>>>>>> http://issues.apache.org/jira/browse/HADOOP-4277
>>>>>>>
>>>>>>> I'm pretty swamped at the moment w/ ApacheCon coming up next
> week, but
>>>>>>> if
>>>>>>> it does fix the issue, then maybe we should move forward to the
> 18.2
>>>>>>> candidate (I don't think it has been released yet, those guys
> have a
>>>>>>> pretty
>>>>>>> sophisticated build process going)
>>>>>>>
>>>>>>> -Grant
>>>>>>>
>>>>>>> On Oct 28, 2008, at 7:19 AM, Philippe Lamarche wrote:
>>>>>>>
>>>>>>> Ubuntu linux 2.6.24 <http://2.6.24.21>, with java-6- 
>>>>>>> sun-1.6.0.07.
>>>>>>>
>>>>>>>
>>>>>>>> On Tue, Oct 28, 2008 at 7:03 AM, Grant Ingersoll
> <gsingers@apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Just a single machine.  I didn't think we were using features
> either.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Are
>>>>>>>>> you saying you can run the example using 0.18.1?
>>>>>>>>>
>>>>>>>>> BTW, Philippe, what JVM, O/S, etc. are you using?
>>>>>>>>>
>>>>>>>>> -Grant
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 27, 2008, at 11:55 PM, Jeff Eastman wrote:
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Are you guys running on real Hadoop arrays? I can run the
> synthetic
>>>>>>>>>> control example just fine on a single machine. That code is
> just
>>>>>>>>>> trying
>>>>>>>>>> to
>>>>>>>>>> read a vector from a string. I'd be surprised if we were  
>>>>>>>>>> using
> any
>>>>>>>>>> "features" but will watch the threads.
>>>>>>>>>>
>>>>>>>>>> Jeff
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Grant Ingersoll wrote:
>>>>>>>>>>
>>>>>>>>>> I started a thread on core-user@hadoop.a.o:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> http://hadoop.markmail.org/message/cczunzfhpcqz6pis
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 27, 2008, at 9:49 PM, Grant Ingersoll wrote:
>>>>>>>>>>>
>>>>>>>>>>> OK, I can confirm that the exact same code works with 0.17.2
> and not
>>>>>>>>>>> w/
>>>>>>>>>>>
>>>>>>>>>>> 0.18.1.  So, it sounds like a bug in Hadoop, or we are
> relying on
>>>>>>>>>>>
>>>>>>>>>>>> incorrect behavior in Hadoop.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 27, 2008, at 9:33 PM, Grant Ingersoll wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Unfortunately, I went straight from 0.17.2 to 0.18.1.  It
> was
>>>>>>>>>>>>> working
>>>>>>>>>>>>>
>>>>>>>>>>>>> on
>>>>>>>>>>>>>
>>>>>>>>>>>>>> 0.17.2.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> BTW, are you saying the same exact code was working on
> 0.17.2 or
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> are
>>>>>>>>>>>>> you referring to some older Mahout code that worked on
> 17.2?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <
>>>>>>>>>>>>>
>>>>>>>>>>>>>> gsingers@apache.org
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Did this work with 0.18.0 or other prior versions for  
>>>>>>>>>>>>>> you?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I just updated to hadoop 0.18.1 and got a clean version
> of
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> mahout
>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>> svn.
>>>>>>>>>>>>>>>> However, I am having problems with KMeans, that can be
> traced
>>>>>>>>>>>>>>>> down
>>>>>>>>>>>>>>>> to :
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO
> org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>>>> Merging
>>>>>>>>>>>>>>>> 2 sorted segments
>>>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO
> org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>>>> Down
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> the last merge-pass, with 2 segments left of total  
>>>>>>>>>>>>>>>> size:
> 5011
>>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 WARN
>>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0 Merge of the
> inmemory
>>>>>>>>>>>>>>>> files
>>>>>>>>>>>>>>>> threw
>>>>>>>>>>>>>>>> an exception: java.io.IOException: Intermedate merge
> failed
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
> $InMemFSMergeThread.doIn
> MemMerge(ReduceTask.java:2147)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
> $InMemFSMergeThread.run(
> ReduceTask.java:2078)
>>>>>>>>>>>>>>>> Caused by: java.lang.NumberFormatException: For input
> string:
>>>>>>>>>>>>>>>> "["
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java: 
> 1224)
>>>>>>>>>>>>>>>> at java.lang.Double.parseDouble(Double.java:510)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org 
> .apache.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java
> :256)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org 
> .apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner
> .java:38)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org 
> .apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner
> .java:31)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.ReduceTask 
> $ReduceCopier.combineAndSpill(ReduceT
> ask.java:2174)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access 
> $3100(ReduceTask.
> java:341)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier 
> $InMemFSMergeThread.doIn
> MemMerge(ReduceTask.java:2134)
>>>>>>>>>>>>>>>> ... 1 more
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 INFO
>>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>>>> In-memory merge complete: 0 files left.
>>>>>>>>>>>>>>>> 2008-10-25 19:10:17,000 WARN
>>>>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>>>>>>>> Error running child
>>>>>>>>>>>>>>>> java.io.IOException:
> attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>>> copier failed
>>>>>>>>>>>>>>>> at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is while running the synthetic_control.data
> example, but I
>>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same problems with any other input data.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am able to do other map-reduce job without problems.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here is the output of the jar task:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> hadoop@philippe-vaio:/usr/local/hadoop$ bin/hadoop jar
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> /home/philippe/workspace/MahoutJava/examples/dist/apache-mahout- 
> examples
> -0.1-dev.jar
>>>>>>>>>>>>>>>> org 
>>>>>>>>>>>>>>>> .apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>>>>>>>>>>>>>> 08/10/25 19:09:27 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
> Tool for
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>>>> 08/10/25 19:09:29 INFO mapred.JobClient:  map 0% reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:31 INFO mapred.JobClient:  map 50%  
>>>>>>>>>>>>>>>> reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>>> read=291644
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>>> written=323660
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Launched
> map
>>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Data-local
> map
>>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Map-Reduce
> Framework
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>>> bytes=288374
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
> Tool for
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>>>> 08/10/25 19:09:33 INFO mapred.JobClient:  map 0% reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:37 INFO mapred.JobClient:  map 50%  
>>>>>>>>>>>>>>>> reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:39 INFO mapred.JobClient:  map 100%
> reduce 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:44 INFO mapred.JobClient:  map 100%
> reduce 16%
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>>> read=323660
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>>> written=1447
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local  
>>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>>> read=1389
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local  
>>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>>> written=37878
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched
> reduce
>>>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched
> map
>>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Data-local
> map
>>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Map-Reduce
> Framework
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce
> input
>>>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine
> output
>>>>>>>>>>>>>>>> records=29
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce
> output
>>>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>>> bytes=943020
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine
> input
>>>>>>>>>>>>>>>> records=1760
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>>> records=1732
>>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce
> input
>>>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>>>> 08/10/25 19:09:53 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
> Tool for
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>>>> 08/10/25 19:09:54 INFO mapred.JobClient:  map 0% reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:56 INFO mapred.JobClient:  map 50%  
>>>>>>>>>>>>>>>> reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:09:58 INFO mapred.JobClient:  map 100%
> reduce 0%
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>>> read=326554
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>>> written=1137260
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local  
>>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>>> read=1147358
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local  
>>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>>> written=2304490
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched
> reduce
>>>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched
> map
>>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Data-local
> map
>>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Map-Reduce
> Framework
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce
> input
>>>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine
> output
>>>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce
> output
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>>> bytes=1139660
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine
> input
>>>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce
> input
>>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
> Tool for
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total
> input
>>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>>> 08/10/25 19:10:03 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>>> job_200810251826_0013
>>>>>>>>>>>>>>>> 08/10/25 19:10:04 INFO mapred.JobClient:  map 0% reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:10:08 INFO mapred.JobClient:  map 50%  
>>>>>>>>>>>>>>>> reduce
> 0%
>>>>>>>>>>>>>>>> 08/10/25 19:10:09 INFO mapred.JobClient:  map 100%
> reduce 0%
>>>>>>>>>>>>>>>> 08/10/25 19:10:21 INFO mapred.JobClient: Task Id :
>>>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0, Status : FAILED
>>>>>>>>>>>>>>>> java.io.IOException:
> attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>>> copier
>>>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>>> at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am not sure if I am doing something wrong here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Philippe.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US
> New
>>>>>>>>>>>>>>> Orleans.
>>>>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
>>>>>>>>>>>>> Orleans.
>>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
> Orleans.
>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
> Orleans.
>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --------------------------
>>>>>>>>>>>
>>>>>>>>> Grant Ingersoll
>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
> Orleans.
>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------
>>>>>>>>>
>>>>>>> Grant Ingersoll
>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
> Orleans.
>>>>>>> http://www.lucenebootcamp.com
>>>>>>>
>>>>>>>
>>>>>>> Lucene Helpful Hints:
>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>

RE: Problems with KMeans clustering

Posted by "Palleti, Pallavi" <pa...@corp.aol.com>.
Hi all,

 The same is discussed here:
https://issues.apache.org/jira/browse/MAHOUT-79

I have patch for fixing this issue ready. If no one is working on it, I
can open an issue in jira and commit the same.

Thanks
Pallavi

-----Original Message-----
From: Jeff Eastman [mailto:jdog@windwardsolutions.com] 
Sent: Thursday, November 06, 2008 5:50 AM
To: mahout-user@lucene.apache.org
Subject: Re: Problems with KMeans clustering

Thanks Steve,

That was a subtle change that was evidently made after Kmeans was 
implemented and did not show up until later when people such as Philippe

and yourself ran it with real problems on real clusters. While the type 
signatures of the reducer and combiner are in fact the same, the values 
provided by the mapper and combiner are different and could indeed 
create the odd behavior that was reported.

The algorithm's dependence upon run-once behavior is pretty fundamental,

since summing of cluster centroids is done in the combiner and the 
reducer does a merge of those clusters. I'd be interested in exactly how

you resolved this.

It likely applies to some of the other clustering implementations too.

Finally, can you explain why this problem no longer seems to occur with 
Hadoop trunk?

Jeff


Steve Schlosser wrote:
> Hi folks
>
> A while back we upgraded our Hadoop cluster from 0.15 to 0.18.0, and I
> found that Mahout Kmeans quit working.  I finally tracked it down to
> the fact that the semantics of the combiner changed between 0.16,
> 0.17, and 0.18 from run exactly once to run zero or more times (which
> is in line with how Map/Reduce was originally specified).  See:
> https://issues.apache.org/jira/browse/HADOOP-3586.
>
> The Kmeans combiner depended on running exactly once, but on our new
> cluster it was running multiple times, causing hard-to-discern errors.
>  Basically, the second time through the Combiner, it would throw an
> exception that the formatting of the vector (serialized into a Text)
> was failing.  In the end, I had to make some formatting changes to the
> data output by the Mapper and the Combiner to match what the Reducer
> expects, as well as changes to the Combiner input to .  I ended up
> having to hack the Mapper to output vectors that either the Combiner
> or Reducer could take as input, and make the Combiner take in the same
> input that it outputs and to calculate convergence at each step.
>
> My apologies if this has already been covered and put to rest - I just
> happened upon this thread this afternoon.
>
> -steve
>
> On Sun, Nov 2, 2008 at 10:29 AM, Philippe Lamarche
> <ph...@gmail.com> wrote:
>   
>> Hi there,
>> It also works on 0.19.0-dev, that is on hadoop/branches/branch-0.19.
>>
>> I intend in the next few day to try to find out what exactly is the
problem
>> to make sure that it won't come back in a few revisions.
>>
>> Thanks!
>>
>> On Thu, Oct 30, 2008 at 9:20 AM, Grant Ingersoll
<gs...@apache.org>wrote:
>>
>>     
>>> Hmm, I believe that patch has been applied in 18.2 (whatever that
is) but
>>> it also looks like it has been applied to 0.17.3 branch as well.
So, it
>>> might be something else that "fixed" it.
>>>
>>> At any rate, glad to hear it works on trunk.
>>>
>>>
>>> On Oct 29, 2008, at 6:38 PM, Philippe Lamarche wrote:
>>>
>>>  I am not sure I understand the hadoop svn structure, however I was
able to
>>>       
>>>> make it work with hadoop trunk, or 0.20.0-dev.
>>>> It didn't work with hadoop/branch-0.18, with or without patch 4277.
>>>>
>>>>
>>>> Here is a copy-paste of the steps, once Hadoop is built and
installed.  I
>>>> am
>>>> using the same exact "apache-mahout-examples-0.1-dev.job", not
rebuilt
>>>> with
>>>> the 0.20.0-dev jars.
>>>>
>>>> It works!
>>>>
>>>> That would mean that the bug/feature is not related to
>>>> HADOOP-4277<http://issues.apache.org/jira/browse/HADOOP-4277>,
>>>>
>>>> and was reintroduced (or never took away) in hadoop/trunk.
>>>>
>>>>
>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop namenode -format
>>>> 08/10/29 18:27:59 INFO namenode.NameNode: STARTUP_MSG:
>>>> /************************************************************
>>>> STARTUP_MSG: Starting NameNode
>>>> STARTUP_MSG:   host = phil/127.0.1.1
>>>> STARTUP_MSG:   args = [-format]
>>>> STARTUP_MSG:   version = 0.20.0-dev
>>>> STARTUP_MSG:   build =  -r ; compiled by 'philippe' on Wed Oct 29
18:25:08
>>>> EDT 2008
>>>> ************************************************************/
>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: supergroup=supergroup
>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem:
isPermissionEnabled=true
>>>> 08/10/29 18:28:00 INFO common.Storage: Image file of size 96 saved
in 0
>>>> seconds.
>>>> 08/10/29 18:28:00 INFO common.Storage: Storage directory
>>>> /usr/local/hadoop-datastore/hadoop-hadoop/dfs/name has been
successfully
>>>> formatted.
>>>> 08/10/29 18:28:00 INFO namenode.NameNode: SHUTDOWN_MSG:
>>>> /************************************************************
>>>> SHUTDOWN_MSG: Shutting down NameNode at phil/127.0.1.1
>>>> ************************************************************/
>>>>
>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop dfs -put
>>>> /home/philippe/synthetic_control.data testdata
>>>>
>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop jar
>>>>
>>>>
/home/philippe/workspace/MahoutJava/examples/build/apache-mahout-example
s-0.1-dev.job
>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>> 08/10/29 18:28:45 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:28:46 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 1
>>>> 08/10/29 18:28:47 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0002
>>>> 08/10/29 18:28:48 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:28:54 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:28:55 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0002
>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Counters: 7
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes read=291644
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes
written=323660
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input bytes=288374
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:28:56 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:28:56 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0003
>>>> 08/10/29 18:28:57 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:29:03 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:29:05 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:29:10 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:29:11 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0003
>>>> 08/10/29 18:29:11 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes read=323660
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes
written=9657
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes read=36119
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes
written=72300
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input groups=1
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine output
records=28
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce output
records=7
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output
bytes=943020
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine input
records=1732
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output
records=1732
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input
records=28
>>>> 08/10/29 18:29:11 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:29:11 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:29:12 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0004
>>>> 08/10/29 18:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:29:20 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:29:22 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:29:27 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0004
>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes read=342974
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes
written=3002539
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes
read=3018455
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes
written=6036972
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine output
records=0
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce output
records=1591
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output
bytes=3008903
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine input
records=0
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output
records=1591
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input
records=1591
>>>> 08/10/29 18:29:28 INFO kmeans.KMeansDriver: Iteration 0
>>>> 08/10/29 18:29:28 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:29:28 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0005
>>>> 08/10/29 18:29:29 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:29:35 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:29:37 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:29:41 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0005
>>>> 08/10/29 18:29:41 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes read=342974
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes
written=8205
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes read=23227
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes
written=46516
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine output
records=10
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce output
records=7
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output
bytes=1136504
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine input
records=600
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input
records=10
>>>> 08/10/29 18:29:41 INFO kmeans.KMeansDriver: Iteration 1
>>>> 08/10/29 18:29:41 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:29:41 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:29:42 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0006
>>>> 08/10/29 18:29:43 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:29:50 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:29:51 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:29:55 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0006
>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes read=340070
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes
written=8242
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes read=21265
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes
written=42592
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine output
records=10
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce output
records=7
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output
bytes=1023966
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine input
records=600
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input
records=10
>>>> 08/10/29 18:29:56 INFO kmeans.KMeansDriver: Iteration 2
>>>> 08/10/29 18:29:56 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:29:56 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0007
>>>> 08/10/29 18:29:57 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:03 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:05 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0007
>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes read=340144
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes
written=8280
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes read=21085
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes
written=42232
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine output
records=10
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce output
records=7
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output
bytes=1023681
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine input
records=600
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input
records=10
>>>> 08/10/29 18:30:09 INFO kmeans.KMeansDriver: Iteration 3
>>>> 08/10/29 18:30:09 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:30:09 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0008
>>>> 08/10/29 18:30:10 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:17 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:18 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:30:22 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:30:23 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0008
>>>> 08/10/29 18:30:23 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes read=340220
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes
written=8250
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes read=21339
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes
written=42740
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine output
records=10
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce output
records=7
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output
bytes=1028419
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine input
records=600
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input
records=10
>>>> 08/10/29 18:30:23 INFO kmeans.KMeansDriver: Iteration 4
>>>> 08/10/29 18:30:23 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:30:23 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:30:24 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0009
>>>> 08/10/29 18:30:25 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:31 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:33 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:30:37 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0009
>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes read=340160
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes
written=8200
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes read=21219
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes
written=42500
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched reduce
tasks=1
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine output
records=10
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce output
records=7
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output
bytes=1024899
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine input
records=600
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input
records=10
>>>> 08/10/29 18:30:38 INFO kmeans.KMeansDriver: Clustering
>>>> 08/10/29 18:30:38 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:30:38 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0010
>>>> 08/10/29 18:30:39 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:45 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:47 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0010
>>>> 08/10/29 18:30:47 INFO mapred.JobClient: Counters: 7
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes read=340060
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes
written=1020535
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:30:47 WARN mapred.JobClient: Use GenericOptionsParser
for
>>>> parsing the arguments. Applications should implement Tool for the
same.
>>>> 08/10/29 18:30:47 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:30:48 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0011
>>>> 08/10/29 18:30:49 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:56 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:57 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0011
>>>> 08/10/29 18:30:57 INFO mapred.JobClient: Counters: 7
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes
read=1020535
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes
written=325460
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input
bytes=1020535
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map output records=600
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Oct 29, 2008 at 11:10 AM, Philippe Lamarche <
>>>> philippe.lamarche@gmail.com> wrote:
>>>>
>>>>  I will!
>>>>         
>>>>> On 10/29/08, Grant Ingersoll <gs...@apache.org> wrote:
>>>>>
>>>>>           
>>>>>> Philippe, can you try the patch suggested by Arun Murthy on
>>>>>> core-user@hadoop.a.o?  See
>>>>>> http://issues.apache.org/jira/browse/HADOOP-4277
>>>>>>
>>>>>> I'm pretty swamped at the moment w/ ApacheCon coming up next
week, but
>>>>>> if
>>>>>> it does fix the issue, then maybe we should move forward to the
18.2
>>>>>> candidate (I don't think it has been released yet, those guys
have a
>>>>>> pretty
>>>>>> sophisticated build process going)
>>>>>>
>>>>>> -Grant
>>>>>>
>>>>>> On Oct 28, 2008, at 7:19 AM, Philippe Lamarche wrote:
>>>>>>
>>>>>> Ubuntu linux 2.6.24 <http://2.6.24.21>, with java-6-sun-1.6.0.07.
>>>>>>
>>>>>>             
>>>>>>> On Tue, Oct 28, 2008 at 7:03 AM, Grant Ingersoll
<gsingers@apache.org
>>>>>>>
>>>>>>>               
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>                 
>>>>>>> Just a single machine.  I didn't think we were using features
either.
>>>>>>>
>>>>>>>               
>>>>>>>> Are
>>>>>>>> you saying you can run the example using 0.18.1?
>>>>>>>>
>>>>>>>> BTW, Philippe, what JVM, O/S, etc. are you using?
>>>>>>>>
>>>>>>>> -Grant
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 27, 2008, at 11:55 PM, Jeff Eastman wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Are you guys running on real Hadoop arrays? I can run the
synthetic
>>>>>>>>> control example just fine on a single machine. That code is
just
>>>>>>>>> trying
>>>>>>>>> to
>>>>>>>>> read a vector from a string. I'd be surprised if we were using
any
>>>>>>>>> "features" but will watch the threads.
>>>>>>>>>
>>>>>>>>> Jeff
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Grant Ingersoll wrote:
>>>>>>>>>
>>>>>>>>> I started a thread on core-user@hadoop.a.o:
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> http://hadoop.markmail.org/message/cczunzfhpcqz6pis
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 27, 2008, at 9:49 PM, Grant Ingersoll wrote:
>>>>>>>>>>
>>>>>>>>>> OK, I can confirm that the exact same code works with 0.17.2
and not
>>>>>>>>>> w/
>>>>>>>>>>
>>>>>>>>>>  0.18.1.  So, it sounds like a bug in Hadoop, or we are
relying on
>>>>>>>>>>                     
>>>>>>>>>>> incorrect behavior in Hadoop.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 27, 2008, at 9:33 PM, Grant Ingersoll wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote:
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>>>> Unfortunately, I went straight from 0.17.2 to 0.18.1.  It
was
>>>>>>>>>>>> working
>>>>>>>>>>>>
>>>>>>>>>>>>  on
>>>>>>>>>>>>                         
>>>>>>>>>>>>> 0.17.2.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> BTW, are you saying the same exact code was working on
0.17.2 or
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>>>> are
>>>>>>>>>>>> you referring to some older Mahout code that worked on
17.2?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <
>>>>>>>>>>>>                         
>>>>>>>>>>>>> gsingers@apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>> Did this work with 0.18.0 or other prior versions for you?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>> On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  I just updated to hadoop 0.18.1 and got a clean version
of
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>>> mahout
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> svn.
>>>>>>>>>>>>>>> However, I am having problems with KMeans, that can be
traced
>>>>>>>>>>>>>>> down
>>>>>>>>>>>>>>> to :
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO
org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>>> Merging
>>>>>>>>>>>>>>> 2 sorted segments
>>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO
org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>>> Down
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> the last merge-pass, with 2 segments left of total size:
5011
>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 WARN
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0 Merge of the
inmemory
>>>>>>>>>>>>>>> files
>>>>>>>>>>>>>>> threw
>>>>>>>>>>>>>>> an exception: java.io.IOException: Intermedate merge
failed
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doIn
MemMerge(ReduceTask.java:2147)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(
ReduceTask.java:2078)
>>>>>>>>>>>>>>> Caused by: java.lang.NumberFormatException: For input
string:
>>>>>>>>>>>>>>> "["
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
>>>>>>>>>>>>>>> at java.lang.Double.parseDouble(Double.java:510)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java
:256)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner
.java:38)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner
.java:31)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.combineAndSpill(ReduceT
ask.java:2174)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$3100(ReduceTask.
java:341)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doIn
MemMerge(ReduceTask.java:2134)
>>>>>>>>>>>>>>> ... 1 more
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 INFO
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>>> In-memory merge complete: 0 files left.
>>>>>>>>>>>>>>> 2008-10-25 19:10:17,000 WARN
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>>>>>>> Error running child
>>>>>>>>>>>>>>> java.io.IOException:
attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>> copier failed
>>>>>>>>>>>>>>> at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is while running the synthetic_control.data
example, but I
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same problems with any other input data.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am able to do other map-reduce job without problems.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is the output of the jar task:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> hadoop@philippe-vaio:/usr/local/hadoop$ bin/hadoop jar
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
/home/philippe/workspace/MahoutJava/examples/dist/apache-mahout-examples
-0.1-dev.jar
>>>>>>>>>>>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>>>>>>>>>>>>> 08/10/25 19:09:27 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
Tool for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>>> 08/10/25 19:09:29 INFO mapred.JobClient:  map 0% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:09:31 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> read=291644
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> written=323660
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Launched
map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Data-local
map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Map-Reduce
Framework
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> bytes=288374
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:09:32 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
Tool for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>>> 08/10/25 19:09:33 INFO mapred.JobClient:  map 0% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:09:37 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:09:39 INFO mapred.JobClient:  map 100%
reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:09:44 INFO mapred.JobClient:  map 100%
reduce 16%
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> read=323660
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> written=1447
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>>> read=1389
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>>> written=37878
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched
reduce
>>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched
map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Data-local
map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Map-Reduce
Framework
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce
input
>>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine
output
>>>>>>>>>>>>>>> records=29
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce
output
>>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> bytes=943020
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine
input
>>>>>>>>>>>>>>> records=1760
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> records=1732
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce
input
>>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>>> 08/10/25 19:09:53 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
Tool for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>>> 08/10/25 19:09:54 INFO mapred.JobClient:  map 0% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:09:56 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:09:58 INFO mapred.JobClient:  map 100%
reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> read=326554
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> written=1137260
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>>> read=1147358
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>>> written=2304490
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched
reduce
>>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched
map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Data-local
map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Map-Reduce
Framework
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce
input
>>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine
output
>>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce
output
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> bytes=1139660
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine
input
>>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce
input
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0
>>>>>>>>>>>>>>> 08/10/25 19:10:02 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> parsing the arguments. Applications should implement
Tool for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total
input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:10:03 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>> job_200810251826_0013
>>>>>>>>>>>>>>> 08/10/25 19:10:04 INFO mapred.JobClient:  map 0% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:10:08 INFO mapred.JobClient:  map 50% reduce
0%
>>>>>>>>>>>>>>> 08/10/25 19:10:09 INFO mapred.JobClient:  map 100%
reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:10:21 INFO mapred.JobClient: Task Id :
>>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0, Status : FAILED
>>>>>>>>>>>>>>> java.io.IOException:
attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>> copier
>>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>> at
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am not sure if I am doing something wrong here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Philippe.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US
New
>>>>>>>>>>>>>> Orleans.
>>>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>>>                           
>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
>>>>>>>>>>>> Orleans.
>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
Orleans.
>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --------------------------
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> Grant Ingersoll
>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
Orleans.
>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  --------------------------
>>>>>>>>>>                     
>>>>>>>> Grant Ingersoll
>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
Orleans.
>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>
>>>>>>>>
>>>>>>>> Lucene Helpful Hints:
>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --------------------------
>>>>>>>>                 
>>>>>> Grant Ingersoll
>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
Orleans.
>>>>>> http://www.lucenebootcamp.com
>>>>>>
>>>>>>
>>>>>> Lucene Helpful Hints:
>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>> --------------------------
>>> Grant Ingersoll
>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>> http://www.lucenebootcamp.com
>>>
>>>
>>> Lucene Helpful Hints:
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>       
>
>
>   


Re: Problems with KMeans clustering

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Thanks Steve,

That was a subtle change that was evidently made after Kmeans was 
implemented and did not show up until later when people such as Philippe 
and yourself ran it with real problems on real clusters. While the type 
signatures of the reducer and combiner are in fact the same, the values 
provided by the mapper and combiner are different and could indeed 
create the odd behavior that was reported.

The algorithm's dependence upon run-once behavior is pretty fundamental, 
since summing of cluster centroids is done in the combiner and the 
reducer does a merge of those clusters. I'd be interested in exactly how 
you resolved this.

It likely applies to some of the other clustering implementations too.

Finally, can you explain why this problem no longer seems to occur with 
Hadoop trunk?

Jeff


Steve Schlosser wrote:
> Hi folks
>
> A while back we upgraded our Hadoop cluster from 0.15 to 0.18.0, and I
> found that Mahout Kmeans quit working.  I finally tracked it down to
> the fact that the semantics of the combiner changed between 0.16,
> 0.17, and 0.18 from run exactly once to run zero or more times (which
> is in line with how Map/Reduce was originally specified).  See:
> https://issues.apache.org/jira/browse/HADOOP-3586.
>
> The Kmeans combiner depended on running exactly once, but on our new
> cluster it was running multiple times, causing hard-to-discern errors.
>  Basically, the second time through the Combiner, it would throw an
> exception that the formatting of the vector (serialized into a Text)
> was failing.  In the end, I had to make some formatting changes to the
> data output by the Mapper and the Combiner to match what the Reducer
> expects, as well as changes to the Combiner input to .  I ended up
> having to hack the Mapper to output vectors that either the Combiner
> or Reducer could take as input, and make the Combiner take in the same
> input that it outputs and to calculate convergence at each step.
>
> My apologies if this has already been covered and put to rest - I just
> happened upon this thread this afternoon.
>
> -steve
>
> On Sun, Nov 2, 2008 at 10:29 AM, Philippe Lamarche
> <ph...@gmail.com> wrote:
>   
>> Hi there,
>> It also works on 0.19.0-dev, that is on hadoop/branches/branch-0.19.
>>
>> I intend in the next few day to try to find out what exactly is the problem
>> to make sure that it won't come back in a few revisions.
>>
>> Thanks!
>>
>> On Thu, Oct 30, 2008 at 9:20 AM, Grant Ingersoll <gs...@apache.org>wrote:
>>
>>     
>>> Hmm, I believe that patch has been applied in 18.2 (whatever that is) but
>>> it also looks like it has been applied to 0.17.3 branch as well.    So, it
>>> might be something else that "fixed" it.
>>>
>>> At any rate, glad to hear it works on trunk.
>>>
>>>
>>> On Oct 29, 2008, at 6:38 PM, Philippe Lamarche wrote:
>>>
>>>  I am not sure I understand the hadoop svn structure, however I was able to
>>>       
>>>> make it work with hadoop trunk, or 0.20.0-dev.
>>>> It didn't work with hadoop/branch-0.18, with or without patch 4277.
>>>>
>>>>
>>>> Here is a copy-paste of the steps, once Hadoop is built and installed.  I
>>>> am
>>>> using the same exact "apache-mahout-examples-0.1-dev.job", not rebuilt
>>>> with
>>>> the 0.20.0-dev jars.
>>>>
>>>> It works!
>>>>
>>>> That would mean that the bug/feature is not related to
>>>> HADOOP-4277<http://issues.apache.org/jira/browse/HADOOP-4277>,
>>>>
>>>> and was reintroduced (or never took away) in hadoop/trunk.
>>>>
>>>>
>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop namenode -format
>>>> 08/10/29 18:27:59 INFO namenode.NameNode: STARTUP_MSG:
>>>> /************************************************************
>>>> STARTUP_MSG: Starting NameNode
>>>> STARTUP_MSG:   host = phil/127.0.1.1
>>>> STARTUP_MSG:   args = [-format]
>>>> STARTUP_MSG:   version = 0.20.0-dev
>>>> STARTUP_MSG:   build =  -r ; compiled by 'philippe' on Wed Oct 29 18:25:08
>>>> EDT 2008
>>>> ************************************************************/
>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: supergroup=supergroup
>>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: isPermissionEnabled=true
>>>> 08/10/29 18:28:00 INFO common.Storage: Image file of size 96 saved in 0
>>>> seconds.
>>>> 08/10/29 18:28:00 INFO common.Storage: Storage directory
>>>> /usr/local/hadoop-datastore/hadoop-hadoop/dfs/name has been successfully
>>>> formatted.
>>>> 08/10/29 18:28:00 INFO namenode.NameNode: SHUTDOWN_MSG:
>>>> /************************************************************
>>>> SHUTDOWN_MSG: Shutting down NameNode at phil/127.0.1.1
>>>> ************************************************************/
>>>>
>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop dfs -put
>>>> /home/philippe/synthetic_control.data testdata
>>>>
>>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop jar
>>>>
>>>> /home/philippe/workspace/MahoutJava/examples/build/apache-mahout-examples-0.1-dev.job
>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>> 08/10/29 18:28:45 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 08/10/29 18:28:46 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 1
>>>> 08/10/29 18:28:47 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0002
>>>> 08/10/29 18:28:48 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:28:54 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:28:55 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0002
>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Counters: 7
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes read=291644
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes written=323660
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input bytes=288374
>>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:28:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 08/10/29 18:28:56 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:28:56 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0003
>>>> 08/10/29 18:28:57 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:29:03 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:29:05 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:29:10 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:29:11 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0003
>>>> 08/10/29 18:29:11 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes read=323660
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes written=9657
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes read=36119
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes written=72300
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched reduce tasks=1
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input groups=1
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine output records=28
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce output records=7
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output bytes=943020
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine input records=1732
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output records=1732
>>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input records=28
>>>> 08/10/29 18:29:11 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 08/10/29 18:29:11 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:29:12 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0004
>>>> 08/10/29 18:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:29:20 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:29:22 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:29:27 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0004
>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes read=342974
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes written=3002539
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes read=3018455
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes written=6036972
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched reduce tasks=1
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine output records=0
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce output records=1591
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output bytes=3008903
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine input records=0
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output records=1591
>>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input records=1591
>>>> 08/10/29 18:29:28 INFO kmeans.KMeansDriver: Iteration 0
>>>> 08/10/29 18:29:28 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 08/10/29 18:29:28 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:29:28 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0005
>>>> 08/10/29 18:29:29 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:29:35 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:29:37 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:29:41 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0005
>>>> 08/10/29 18:29:41 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes read=342974
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes written=8205
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes read=23227
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes written=46516
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched reduce tasks=1
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine output records=10
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce output records=7
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output bytes=1136504
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine input records=600
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input records=10
>>>> 08/10/29 18:29:41 INFO kmeans.KMeansDriver: Iteration 1
>>>> 08/10/29 18:29:41 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 08/10/29 18:29:41 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:29:42 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0006
>>>> 08/10/29 18:29:43 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:29:50 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:29:51 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:29:55 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0006
>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes read=340070
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes written=8242
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes read=21265
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes written=42592
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched reduce tasks=1
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine output records=10
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce output records=7
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output bytes=1023966
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine input records=600
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input records=10
>>>> 08/10/29 18:29:56 INFO kmeans.KMeansDriver: Iteration 2
>>>> 08/10/29 18:29:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 08/10/29 18:29:56 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:29:56 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0007
>>>> 08/10/29 18:29:57 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:03 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:05 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0007
>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes read=340144
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes written=8280
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes read=21085
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes written=42232
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched reduce tasks=1
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine output records=10
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce output records=7
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output bytes=1023681
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine input records=600
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input records=10
>>>> 08/10/29 18:30:09 INFO kmeans.KMeansDriver: Iteration 3
>>>> 08/10/29 18:30:09 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 08/10/29 18:30:09 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:30:09 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0008
>>>> 08/10/29 18:30:10 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:17 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:18 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:30:22 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:30:23 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0008
>>>> 08/10/29 18:30:23 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes read=340220
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes written=8250
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes read=21339
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes written=42740
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched reduce tasks=1
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine output records=10
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce output records=7
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output bytes=1028419
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine input records=600
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input records=10
>>>> 08/10/29 18:30:23 INFO kmeans.KMeansDriver: Iteration 4
>>>> 08/10/29 18:30:23 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 08/10/29 18:30:23 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:30:24 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0009
>>>> 08/10/29 18:30:25 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:31 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:33 INFO mapred.JobClient:  map 100% reduce 0%
>>>> 08/10/29 18:30:37 INFO mapred.JobClient:  map 100% reduce 100%
>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0009
>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Counters: 16
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes read=340160
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes written=8200
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes read=21219
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes written=42500
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched reduce tasks=1
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input groups=7
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine output records=10
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce output records=7
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output bytes=1024899
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine input records=600
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input records=10
>>>> 08/10/29 18:30:38 INFO kmeans.KMeansDriver: Clustering
>>>> 08/10/29 18:30:38 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 08/10/29 18:30:38 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:30:38 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0010
>>>> 08/10/29 18:30:39 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:45 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:47 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0010
>>>> 08/10/29 18:30:47 INFO mapred.JobClient: Counters: 7
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes read=340060
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes written=1020535
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input bytes=323660
>>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map output records=600
>>>> 08/10/29 18:30:47 WARN mapred.JobClient: Use GenericOptionsParser for
>>>> parsing the arguments. Applications should implement Tool for the same.
>>>> 08/10/29 18:30:47 INFO mapred.FileInputFormat: Total input paths to
>>>> process
>>>> : 2
>>>> 08/10/29 18:30:48 INFO mapred.JobClient: Running job:
>>>> job_200810291828_0011
>>>> 08/10/29 18:30:49 INFO mapred.JobClient:  map 0% reduce 0%
>>>> 08/10/29 18:30:56 INFO mapred.JobClient:  map 50% reduce 0%
>>>> 08/10/29 18:30:57 INFO mapred.JobClient: Job complete:
>>>> job_200810291828_0011
>>>> 08/10/29 18:30:57 INFO mapred.JobClient: Counters: 7
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   File Systems
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes read=1020535
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes written=325460
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Job Counters
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Launched map tasks=2
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Data-local map tasks=2
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Map-Reduce Framework
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input records=600
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input bytes=1020535
>>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map output records=600
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Oct 29, 2008 at 11:10 AM, Philippe Lamarche <
>>>> philippe.lamarche@gmail.com> wrote:
>>>>
>>>>  I will!
>>>>         
>>>>> On 10/29/08, Grant Ingersoll <gs...@apache.org> wrote:
>>>>>
>>>>>           
>>>>>> Philippe, can you try the patch suggested by Arun Murthy on
>>>>>> core-user@hadoop.a.o?  See
>>>>>> http://issues.apache.org/jira/browse/HADOOP-4277
>>>>>>
>>>>>> I'm pretty swamped at the moment w/ ApacheCon coming up next week, but
>>>>>> if
>>>>>> it does fix the issue, then maybe we should move forward to the 18.2
>>>>>> candidate (I don't think it has been released yet, those guys have a
>>>>>> pretty
>>>>>> sophisticated build process going)
>>>>>>
>>>>>> -Grant
>>>>>>
>>>>>> On Oct 28, 2008, at 7:19 AM, Philippe Lamarche wrote:
>>>>>>
>>>>>> Ubuntu linux 2.6.24 <http://2.6.24.21>, with java-6-sun-1.6.0.07.
>>>>>>
>>>>>>             
>>>>>>> On Tue, Oct 28, 2008 at 7:03 AM, Grant Ingersoll <gsingers@apache.org
>>>>>>>
>>>>>>>               
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>                 
>>>>>>> Just a single machine.  I didn't think we were using features either.
>>>>>>>
>>>>>>>               
>>>>>>>> Are
>>>>>>>> you saying you can run the example using 0.18.1?
>>>>>>>>
>>>>>>>> BTW, Philippe, what JVM, O/S, etc. are you using?
>>>>>>>>
>>>>>>>> -Grant
>>>>>>>>
>>>>>>>>
>>>>>>>> On Oct 27, 2008, at 11:55 PM, Jeff Eastman wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Are you guys running on real Hadoop arrays? I can run the synthetic
>>>>>>>>> control example just fine on a single machine. That code is just
>>>>>>>>> trying
>>>>>>>>> to
>>>>>>>>> read a vector from a string. I'd be surprised if we were using any
>>>>>>>>> "features" but will watch the threads.
>>>>>>>>>
>>>>>>>>> Jeff
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Grant Ingersoll wrote:
>>>>>>>>>
>>>>>>>>> I started a thread on core-user@hadoop.a.o:
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>>>> http://hadoop.markmail.org/message/cczunzfhpcqz6pis
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 27, 2008, at 9:49 PM, Grant Ingersoll wrote:
>>>>>>>>>>
>>>>>>>>>> OK, I can confirm that the exact same code works with 0.17.2 and not
>>>>>>>>>> w/
>>>>>>>>>>
>>>>>>>>>>  0.18.1.  So, it sounds like a bug in Hadoop, or we are relying on
>>>>>>>>>>                     
>>>>>>>>>>> incorrect behavior in Hadoop.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 27, 2008, at 9:33 PM, Grant Ingersoll wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote:
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>>>> Unfortunately, I went straight from 0.17.2 to 0.18.1.  It was
>>>>>>>>>>>> working
>>>>>>>>>>>>
>>>>>>>>>>>>  on
>>>>>>>>>>>>                         
>>>>>>>>>>>>> 0.17.2.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> BTW, are you saying the same exact code was working on 0.17.2 or
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>>>> are
>>>>>>>>>>>> you referring to some older Mahout code that worked on 17.2?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>  On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <
>>>>>>>>>>>>                         
>>>>>>>>>>>>> gsingers@apache.org
>>>>>>>>>>>>>
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>> Did this work with 0.18.0 or other prior versions for you?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                           
>>>>>>>>>>>>>> On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  I just updated to hadoop 0.18.1 and got a clean version of
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>>>> mahout
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> svn.
>>>>>>>>>>>>>>> However, I am having problems with KMeans, that can be traced
>>>>>>>>>>>>>>> down
>>>>>>>>>>>>>>> to :
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>>> Merging
>>>>>>>>>>>>>>> 2 sorted segments
>>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>>> Down
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> the last merge-pass, with 2 segments left of total size: 5011
>>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 WARN
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0 Merge of the inmemory
>>>>>>>>>>>>>>> files
>>>>>>>>>>>>>>> threw
>>>>>>>>>>>>>>> an exception: java.io.IOException: Intermedate merge failed
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2078)
>>>>>>>>>>>>>>> Caused by: java.lang.NumberFormatException: For input string:
>>>>>>>>>>>>>>> "["
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
>>>>>>>>>>>>>>> at java.lang.Double.parseDouble(Double.java:510)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> org.apache.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.combineAndSpill(ReduceTask.java:2174)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$3100(ReduceTask.java:341)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)
>>>>>>>>>>>>>>> ... 1 more
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 INFO
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>>> In-memory merge complete: 0 files left.
>>>>>>>>>>>>>>> 2008-10-25 19:10:17,000 WARN
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>>>>>>> Error running child
>>>>>>>>>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>> copier failed
>>>>>>>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is while running the synthetic_control.data example, but I
>>>>>>>>>>>>>>> have
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same problems with any other input data.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am able to do other map-reduce job without problems.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here is the output of the jar task:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> hadoop@philippe-vaio:/usr/local/hadoop$ bin/hadoop jar
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> /home/philippe/workspace/MahoutJava/examples/dist/apache-mahout-examples-0.1-dev.jar
>>>>>>>>>>>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>>>>>>>>>>>>> 08/10/25 19:09:27 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>>> 08/10/25 19:09:29 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:09:31 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> read=291644
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> written=323660
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Launched map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Data-local map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> bytes=288374
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:09:32 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>>> 08/10/25 19:09:33 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:09:37 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:09:39 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:09:44 INFO mapred.JobClient:  map 100% reduce 16%
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> read=323660
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> written=1447
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>>> read=1389
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>>> written=37878
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched reduce
>>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Data-local map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine output
>>>>>>>>>>>>>>> records=29
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce output
>>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> bytes=943020
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine input
>>>>>>>>>>>>>>> records=1760
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> records=1732
>>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>>> 08/10/25 19:09:53 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>>> 08/10/25 19:09:54 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:09:56 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:09:58 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> read=326554
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>>> written=1137260
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>>> read=1147358
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>>> written=2304490
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched reduce
>>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Data-local map
>>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine output
>>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce output
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> bytes=1139660
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine input
>>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0
>>>>>>>>>>>>>>> 08/10/25 19:10:02 WARN mapred.JobClient: Use
>>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> process
>>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>>> 08/10/25 19:10:03 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>>> job_200810251826_0013
>>>>>>>>>>>>>>> 08/10/25 19:10:04 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:10:08 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:10:09 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>>>>>>>>> 08/10/25 19:10:21 INFO mapred.JobClient: Task Id :
>>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0, Status : FAILED
>>>>>>>>>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>>> copier
>>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am not sure if I am doing something wrong here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Philippe.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                               
>>>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
>>>>>>>>>>>>>> Orleans.
>>>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                             
>>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>>>                           
>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
>>>>>>>>>>>> Orleans.
>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>
>>>>>>>>>>>>                         
>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --------------------------
>>>>>>>>>>>
>>>>>>>>>>>                       
>>>>>>>>>> Grant Ingersoll
>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  --------------------------
>>>>>>>>>>                     
>>>>>>>> Grant Ingersoll
>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>
>>>>>>>>
>>>>>>>> Lucene Helpful Hints:
>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --------------------------
>>>>>>>>                 
>>>>>> Grant Ingersoll
>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>>> http://www.lucenebootcamp.com
>>>>>>
>>>>>>
>>>>>> Lucene Helpful Hints:
>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>> --------------------------
>>> Grant Ingersoll
>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>> http://www.lucenebootcamp.com
>>>
>>>
>>> Lucene Helpful Hints:
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>       
>
>
>   


Re: Problems with KMeans clustering

Posted by Steve Schlosser <sw...@gmail.com>.
Hi folks

A while back we upgraded our Hadoop cluster from 0.15 to 0.18.0, and I
found that Mahout Kmeans quit working.  I finally tracked it down to
the fact that the semantics of the combiner changed between 0.16,
0.17, and 0.18 from run exactly once to run zero or more times (which
is in line with how Map/Reduce was originally specified).  See:
https://issues.apache.org/jira/browse/HADOOP-3586.

The Kmeans combiner depended on running exactly once, but on our new
cluster it was running multiple times, causing hard-to-discern errors.
 Basically, the second time through the Combiner, it would throw an
exception that the formatting of the vector (serialized into a Text)
was failing.  In the end, I had to make some formatting changes to the
data output by the Mapper and the Combiner to match what the Reducer
expects, as well as changes to the Combiner input to .  I ended up
having to hack the Mapper to output vectors that either the Combiner
or Reducer could take as input, and make the Combiner take in the same
input that it outputs and to calculate convergence at each step.

My apologies if this has already been covered and put to rest - I just
happened upon this thread this afternoon.

-steve

On Sun, Nov 2, 2008 at 10:29 AM, Philippe Lamarche
<ph...@gmail.com> wrote:
> Hi there,
> It also works on 0.19.0-dev, that is on hadoop/branches/branch-0.19.
>
> I intend in the next few day to try to find out what exactly is the problem
> to make sure that it won't come back in a few revisions.
>
> Thanks!
>
> On Thu, Oct 30, 2008 at 9:20 AM, Grant Ingersoll <gs...@apache.org>wrote:
>
>> Hmm, I believe that patch has been applied in 18.2 (whatever that is) but
>> it also looks like it has been applied to 0.17.3 branch as well.    So, it
>> might be something else that "fixed" it.
>>
>> At any rate, glad to hear it works on trunk.
>>
>>
>> On Oct 29, 2008, at 6:38 PM, Philippe Lamarche wrote:
>>
>>  I am not sure I understand the hadoop svn structure, however I was able to
>>> make it work with hadoop trunk, or 0.20.0-dev.
>>> It didn't work with hadoop/branch-0.18, with or without patch 4277.
>>>
>>>
>>> Here is a copy-paste of the steps, once Hadoop is built and installed.  I
>>> am
>>> using the same exact "apache-mahout-examples-0.1-dev.job", not rebuilt
>>> with
>>> the 0.20.0-dev jars.
>>>
>>> It works!
>>>
>>> That would mean that the bug/feature is not related to
>>> HADOOP-4277<http://issues.apache.org/jira/browse/HADOOP-4277>,
>>>
>>> and was reintroduced (or never took away) in hadoop/trunk.
>>>
>>>
>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop namenode -format
>>> 08/10/29 18:27:59 INFO namenode.NameNode: STARTUP_MSG:
>>> /************************************************************
>>> STARTUP_MSG: Starting NameNode
>>> STARTUP_MSG:   host = phil/127.0.1.1
>>> STARTUP_MSG:   args = [-format]
>>> STARTUP_MSG:   version = 0.20.0-dev
>>> STARTUP_MSG:   build =  -r ; compiled by 'philippe' on Wed Oct 29 18:25:08
>>> EDT 2008
>>> ************************************************************/
>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: supergroup=supergroup
>>> 08/10/29 18:28:00 INFO namenode.FSNamesystem: isPermissionEnabled=true
>>> 08/10/29 18:28:00 INFO common.Storage: Image file of size 96 saved in 0
>>> seconds.
>>> 08/10/29 18:28:00 INFO common.Storage: Storage directory
>>> /usr/local/hadoop-datastore/hadoop-hadoop/dfs/name has been successfully
>>> formatted.
>>> 08/10/29 18:28:00 INFO namenode.NameNode: SHUTDOWN_MSG:
>>> /************************************************************
>>> SHUTDOWN_MSG: Shutting down NameNode at phil/127.0.1.1
>>> ************************************************************/
>>>
>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop dfs -put
>>> /home/philippe/synthetic_control.data testdata
>>>
>>> hadoop@phil:/usr/local/hadoop$ bin/hadoop jar
>>>
>>> /home/philippe/workspace/MahoutJava/examples/build/apache-mahout-examples-0.1-dev.job
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>> 08/10/29 18:28:45 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:28:46 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 1
>>> 08/10/29 18:28:47 INFO mapred.JobClient: Running job:
>>> job_200810291828_0002
>>> 08/10/29 18:28:48 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:28:54 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:28:55 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:28:56 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0002
>>> 08/10/29 18:28:56 INFO mapred.JobClient: Counters: 7
>>> 08/10/29 18:28:56 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes read=291644
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     HDFS bytes written=323660
>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:28:56 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map input bytes=288374
>>> 08/10/29 18:28:56 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:28:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:28:56 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:28:56 INFO mapred.JobClient: Running job:
>>> job_200810291828_0003
>>> 08/10/29 18:28:57 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:29:03 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:29:05 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:29:10 INFO mapred.JobClient:  map 100% reduce 100%
>>> 08/10/29 18:29:11 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0003
>>> 08/10/29 18:29:11 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:29:11 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes read=323660
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     HDFS bytes written=9657
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes read=36119
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Local bytes written=72300
>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:29:11 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input groups=1
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine output records=28
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce output records=7
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output bytes=943020
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Combine input records=1732
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Map output records=1732
>>> 08/10/29 18:29:11 INFO mapred.JobClient:     Reduce input records=28
>>> 08/10/29 18:29:11 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:29:11 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:29:12 INFO mapred.JobClient: Running job:
>>> job_200810291828_0004
>>> 08/10/29 18:29:13 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:29:20 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:29:22 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:29:27 INFO mapred.JobClient:  map 100% reduce 100%
>>> 08/10/29 18:29:28 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0004
>>> 08/10/29 18:29:28 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:29:28 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes read=342974
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     HDFS bytes written=3002539
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes read=3018455
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Local bytes written=6036972
>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:29:28 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input groups=7
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine output records=0
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce output records=1591
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output bytes=3008903
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Combine input records=0
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Map output records=1591
>>> 08/10/29 18:29:28 INFO mapred.JobClient:     Reduce input records=1591
>>> 08/10/29 18:29:28 INFO kmeans.KMeansDriver: Iteration 0
>>> 08/10/29 18:29:28 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:29:28 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:29:28 INFO mapred.JobClient: Running job:
>>> job_200810291828_0005
>>> 08/10/29 18:29:29 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:29:35 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:29:37 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:29:41 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0005
>>> 08/10/29 18:29:41 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:29:41 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes read=342974
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     HDFS bytes written=8205
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes read=23227
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Local bytes written=46516
>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:29:41 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input groups=7
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine output records=10
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce output records=7
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output bytes=1136504
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Combine input records=600
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:29:41 INFO mapred.JobClient:     Reduce input records=10
>>> 08/10/29 18:29:41 INFO kmeans.KMeansDriver: Iteration 1
>>> 08/10/29 18:29:41 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:29:41 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:29:42 INFO mapred.JobClient: Running job:
>>> job_200810291828_0006
>>> 08/10/29 18:29:43 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:29:50 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:29:51 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:29:55 INFO mapred.JobClient:  map 100% reduce 100%
>>> 08/10/29 18:29:56 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0006
>>> 08/10/29 18:29:56 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:29:56 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes read=340070
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     HDFS bytes written=8242
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes read=21265
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Local bytes written=42592
>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:29:56 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input groups=7
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine output records=10
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce output records=7
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output bytes=1023966
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Combine input records=600
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:29:56 INFO mapred.JobClient:     Reduce input records=10
>>> 08/10/29 18:29:56 INFO kmeans.KMeansDriver: Iteration 2
>>> 08/10/29 18:29:56 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:29:56 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:29:56 INFO mapred.JobClient: Running job:
>>> job_200810291828_0007
>>> 08/10/29 18:29:57 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:30:03 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:30:05 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:30:09 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0007
>>> 08/10/29 18:30:09 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:30:09 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes read=340144
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     HDFS bytes written=8280
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes read=21085
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Local bytes written=42232
>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:30:09 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input groups=7
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine output records=10
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce output records=7
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output bytes=1023681
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Combine input records=600
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:30:09 INFO mapred.JobClient:     Reduce input records=10
>>> 08/10/29 18:30:09 INFO kmeans.KMeansDriver: Iteration 3
>>> 08/10/29 18:30:09 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:30:09 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:30:09 INFO mapred.JobClient: Running job:
>>> job_200810291828_0008
>>> 08/10/29 18:30:10 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:30:17 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:30:18 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:30:22 INFO mapred.JobClient:  map 100% reduce 100%
>>> 08/10/29 18:30:23 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0008
>>> 08/10/29 18:30:23 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:30:23 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes read=340220
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     HDFS bytes written=8250
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes read=21339
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Local bytes written=42740
>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:30:23 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input groups=7
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine output records=10
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce output records=7
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output bytes=1028419
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Combine input records=600
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:30:23 INFO mapred.JobClient:     Reduce input records=10
>>> 08/10/29 18:30:23 INFO kmeans.KMeansDriver: Iteration 4
>>> 08/10/29 18:30:23 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:30:23 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:30:24 INFO mapred.JobClient: Running job:
>>> job_200810291828_0009
>>> 08/10/29 18:30:25 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:30:31 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:30:33 INFO mapred.JobClient:  map 100% reduce 0%
>>> 08/10/29 18:30:37 INFO mapred.JobClient:  map 100% reduce 100%
>>> 08/10/29 18:30:38 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0009
>>> 08/10/29 18:30:38 INFO mapred.JobClient: Counters: 16
>>> 08/10/29 18:30:38 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes read=340160
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     HDFS bytes written=8200
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes read=21219
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Local bytes written=42500
>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched reduce tasks=1
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:30:38 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input groups=7
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine output records=10
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce output records=7
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output bytes=1024899
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Combine input records=600
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:30:38 INFO mapred.JobClient:     Reduce input records=10
>>> 08/10/29 18:30:38 INFO kmeans.KMeansDriver: Clustering
>>> 08/10/29 18:30:38 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:30:38 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:30:38 INFO mapred.JobClient: Running job:
>>> job_200810291828_0010
>>> 08/10/29 18:30:39 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:30:45 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:30:47 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0010
>>> 08/10/29 18:30:47 INFO mapred.JobClient: Counters: 7
>>> 08/10/29 18:30:47 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes read=340060
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     HDFS bytes written=1020535
>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:30:47 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map input bytes=323660
>>> 08/10/29 18:30:47 INFO mapred.JobClient:     Map output records=600
>>> 08/10/29 18:30:47 WARN mapred.JobClient: Use GenericOptionsParser for
>>> parsing the arguments. Applications should implement Tool for the same.
>>> 08/10/29 18:30:47 INFO mapred.FileInputFormat: Total input paths to
>>> process
>>> : 2
>>> 08/10/29 18:30:48 INFO mapred.JobClient: Running job:
>>> job_200810291828_0011
>>> 08/10/29 18:30:49 INFO mapred.JobClient:  map 0% reduce 0%
>>> 08/10/29 18:30:56 INFO mapred.JobClient:  map 50% reduce 0%
>>> 08/10/29 18:30:57 INFO mapred.JobClient: Job complete:
>>> job_200810291828_0011
>>> 08/10/29 18:30:57 INFO mapred.JobClient: Counters: 7
>>> 08/10/29 18:30:57 INFO mapred.JobClient:   File Systems
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes read=1020535
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     HDFS bytes written=325460
>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Job Counters
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Launched map tasks=2
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Data-local map tasks=2
>>> 08/10/29 18:30:57 INFO mapred.JobClient:   Map-Reduce Framework
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input records=600
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map input bytes=1020535
>>> 08/10/29 18:30:57 INFO mapred.JobClient:     Map output records=600
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Oct 29, 2008 at 11:10 AM, Philippe Lamarche <
>>> philippe.lamarche@gmail.com> wrote:
>>>
>>>  I will!
>>>>
>>>>
>>>> On 10/29/08, Grant Ingersoll <gs...@apache.org> wrote:
>>>>
>>>>>
>>>>> Philippe, can you try the patch suggested by Arun Murthy on
>>>>> core-user@hadoop.a.o?  See
>>>>> http://issues.apache.org/jira/browse/HADOOP-4277
>>>>>
>>>>> I'm pretty swamped at the moment w/ ApacheCon coming up next week, but
>>>>> if
>>>>> it does fix the issue, then maybe we should move forward to the 18.2
>>>>> candidate (I don't think it has been released yet, those guys have a
>>>>> pretty
>>>>> sophisticated build process going)
>>>>>
>>>>> -Grant
>>>>>
>>>>> On Oct 28, 2008, at 7:19 AM, Philippe Lamarche wrote:
>>>>>
>>>>> Ubuntu linux 2.6.24 <http://2.6.24.21>, with java-6-sun-1.6.0.07.
>>>>>
>>>>>>
>>>>>> On Tue, Oct 28, 2008 at 7:03 AM, Grant Ingersoll <gsingers@apache.org
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>
>>>>>> Just a single machine.  I didn't think we were using features either.
>>>>>>
>>>>>>> Are
>>>>>>> you saying you can run the example using 0.18.1?
>>>>>>>
>>>>>>> BTW, Philippe, what JVM, O/S, etc. are you using?
>>>>>>>
>>>>>>> -Grant
>>>>>>>
>>>>>>>
>>>>>>> On Oct 27, 2008, at 11:55 PM, Jeff Eastman wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>>> Are you guys running on real Hadoop arrays? I can run the synthetic
>>>>>>>> control example just fine on a single machine. That code is just
>>>>>>>> trying
>>>>>>>> to
>>>>>>>> read a vector from a string. I'd be surprised if we were using any
>>>>>>>> "features" but will watch the threads.
>>>>>>>>
>>>>>>>> Jeff
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Grant Ingersoll wrote:
>>>>>>>>
>>>>>>>> I started a thread on core-user@hadoop.a.o:
>>>>>>>>
>>>>>>>>> http://hadoop.markmail.org/message/cczunzfhpcqz6pis
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Oct 27, 2008, at 9:49 PM, Grant Ingersoll wrote:
>>>>>>>>>
>>>>>>>>> OK, I can confirm that the exact same code works with 0.17.2 and not
>>>>>>>>> w/
>>>>>>>>>
>>>>>>>>>  0.18.1.  So, it sounds like a bug in Hadoop, or we are relying on
>>>>>>>>>> incorrect behavior in Hadoop.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 27, 2008, at 9:33 PM, Grant Ingersoll wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Oct 26, 2008, at 10:46 AM, Philippe Lamarche wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately, I went straight from 0.17.2 to 0.18.1.  It was
>>>>>>>>>>> working
>>>>>>>>>>>
>>>>>>>>>>>  on
>>>>>>>>>>>> 0.17.2.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> BTW, are you saying the same exact code was working on 0.17.2 or
>>>>>>>>>>>>
>>>>>>>>>>> are
>>>>>>>>>>> you referring to some older Mahout code that worked on 17.2?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  On Sun, Oct 26, 2008 at 9:48 AM, Grant Ingersoll <
>>>>>>>>>>>> gsingers@apache.org
>>>>>>>>>>>>
>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Did this work with 0.18.0 or other prior versions for you?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Oct 25, 2008, at 7:23 PM, Philippe Lamarche wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>  I just updated to hadoop 0.18.1 and got a clean version of
>>>>>>>>>>>>>> mahout
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> svn.
>>>>>>>>>>>>>> However, I am having problems with KMeans, that can be traced
>>>>>>>>>>>>>> down
>>>>>>>>>>>>>> to :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>> Merging
>>>>>>>>>>>>>> 2 sorted segments
>>>>>>>>>>>>>> 2008-10-25 19:10:16,987 INFO org.apache.hadoop.mapred.Merger:
>>>>>>>>>>>>>> Down
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> the last merge-pass, with 2 segments left of total size: 5011
>>>>>>>>>>>>>> bytes
>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 WARN
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0 Merge of the inmemory
>>>>>>>>>>>>>> files
>>>>>>>>>>>>>> threw
>>>>>>>>>>>>>> an exception: java.io.IOException: Intermedate merge failed
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2078)
>>>>>>>>>>>>>> Caused by: java.lang.NumberFormatException: For input string:
>>>>>>>>>>>>>> "["
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1224)
>>>>>>>>>>>>>> at java.lang.Double.parseDouble(Double.java:510)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java:60)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.mahout.clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.combineAndSpill(ReduceTask.java:2174)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access$3100(ReduceTask.java:341)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134)
>>>>>>>>>>>>>> ... 1 more
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 2008-10-25 19:10:16,999 INFO
>>>>>>>>>>>>>> org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>>>> In-memory merge complete: 0 files left.
>>>>>>>>>>>>>> 2008-10-25 19:10:17,000 WARN
>>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker:
>>>>>>>>>>>>>> Error running child
>>>>>>>>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>> copier failed
>>>>>>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is while running the synthetic_control.data example, but I
>>>>>>>>>>>>>> have
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> same problems with any other input data.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am able to do other map-reduce job without problems.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here is the output of the jar task:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> hadoop@philippe-vaio:/usr/local/hadoop$ bin/hadoop jar
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /home/philippe/workspace/MahoutJava/examples/dist/apache-mahout-examples-0.1-dev.jar
>>>>>>>>>>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
>>>>>>>>>>>>>> 08/10/25 19:09:27 WARN mapred.JobClient: Use
>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 1
>>>>>>>>>>>>>> 08/10/25 19:09:28 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>> 08/10/25 19:09:29 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:31 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>> job_200810251826_0010
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Counters: 7
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>> read=291644
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>> written=323660
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Launched map
>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Data-local map
>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>> bytes=288374
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:09:32 WARN mapred.JobClient: Use
>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>> 08/10/25 19:09:32 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>> 08/10/25 19:09:33 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:37 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:39 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:44 INFO mapred.JobClient:  map 100% reduce 16%
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>> job_200810251826_0011
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>> read=323660
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>> written=1447
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>> read=1389
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>> written=37878
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched reduce
>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Launched map
>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Data-local map
>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine output
>>>>>>>>>>>>>> records=29
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce output
>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>> bytes=943020
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Combine input
>>>>>>>>>>>>>> records=1760
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>> records=1732
>>>>>>>>>>>>>> 08/10/25 19:09:52 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>>> records=1
>>>>>>>>>>>>>> 08/10/25 19:09:53 WARN mapred.JobClient: Use
>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>> 08/10/25 19:09:53 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>> 08/10/25 19:09:54 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:56 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:09:58 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Job complete:
>>>>>>>>>>>>>> job_200810251826_0012
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient: Counters: 16
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   File Systems
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>> read=326554
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     HDFS bytes
>>>>>>>>>>>>>> written=1137260
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>> read=1147358
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Local bytes
>>>>>>>>>>>>>> written=2304490
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Job Counters
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched reduce
>>>>>>>>>>>>>> tasks=1
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Launched map
>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Data-local map
>>>>>>>>>>>>>> tasks=2
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:   Map-Reduce Framework
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>>> groups=1
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine output
>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce output
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>> bytes=1139660
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map input
>>>>>>>>>>>>>> bytes=323660
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Combine input
>>>>>>>>>>>>>> records=0
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Map output
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.JobClient:     Reduce input
>>>>>>>>>>>>>> records=600
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO kmeans.KMeansDriver: Iteration 0
>>>>>>>>>>>>>> 08/10/25 19:10:02 WARN mapred.JobClient: Use
>>>>>>>>>>>>>> GenericOptionsParser
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> parsing the arguments. Applications should implement Tool for
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> same.
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>> 08/10/25 19:10:02 INFO mapred.FileInputFormat: Total input
>>>>>>>>>>>>>> paths
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> process
>>>>>>>>>>>>>> : 2
>>>>>>>>>>>>>> 08/10/25 19:10:03 INFO mapred.JobClient: Running job:
>>>>>>>>>>>>>> job_200810251826_0013
>>>>>>>>>>>>>> 08/10/25 19:10:04 INFO mapred.JobClient:  map 0% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:10:08 INFO mapred.JobClient:  map 50% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:10:09 INFO mapred.JobClient:  map 100% reduce 0%
>>>>>>>>>>>>>> 08/10/25 19:10:21 INFO mapred.JobClient: Task Id :
>>>>>>>>>>>>>> attempt_200810251826_0013_r_000000_0, Status : FAILED
>>>>>>>>>>>>>> java.io.IOException: attempt_200810251826_0013_r_000000_0The
>>>>>>>>>>>>>> reduce
>>>>>>>>>>>>>> copier
>>>>>>>>>>>>>> failed
>>>>>>>>>>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255)
>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am not sure if I am doing something wrong here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for the help,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Philippe.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
>>>>>>>>>>>>> Orleans.
>>>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>> Grant Ingersoll
>>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New
>>>>>>>>>>> Orleans.
>>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --------------------------
>>>>>>>>>>>
>>>>>>>>>> Grant Ingersoll
>>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --------------------------
>>>>>>>>>>
>>>>>>>>> Grant Ingersoll
>>>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>>>>>> http://www.lucenebootcamp.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Lucene Helpful Hints:
>>>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --------------------------
>>>>>>>>
>>>>>>> Grant Ingersoll
>>>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>>>> http://www.lucenebootcamp.com
>>>>>>>
>>>>>>>
>>>>>>> Lucene Helpful Hints:
>>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --------------------------
>>>>> Grant Ingersoll
>>>>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>>>>> http://www.lucenebootcamp.com
>>>>>
>>>>>
>>>>> Lucene Helpful Hints:
>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>> --------------------------
>> Grant Ingersoll
>> Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans.
>> http://www.lucenebootcamp.com
>>
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>