You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by yoshihiro fujimoto <yo...@gmail.com> on 2012/12/18 11:38:11 UTC

Blank output at Dirichlet Process Clustering

Hi,

I've used mahout at Dirichlet Process Clustering.
Input records is 37 ,but output records is 0.
In the case of 1800 records, output is normal(output records is 1800).

What are your suggestions to solve this problem?

== java code( the case of 37 records, and use mahout-core-0.7.jar,
mahout-math-0.7.jar)

DirichletDriver.run(conf,
 new Path("data/vector/vector.seq"),
new Path("data/dirichlet"),
 new DistributionDescription(GaussianClusterDistribution.class.getName(),
RandomAccessSparseVector.class.getName(),
EuclideanDistanceMeasure.class.getName(),
 37),
10, 2, 0.1, true, false, 0.1, false);

== log ( the case of 37 records)


2012/12/17 10:17:09 org.apache.hadoop.util.NativeCodeLoader#<clinit>:52
WARN: Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
Cluster Iterator running iteration 1 over priorPath:
data/dirichlet/clusters-0
2012/12/17 10:17:10
org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667
WARN: Use GenericOptionsParser for parsing the arguments. Applications
should implement Tool for the same.
2012/12/17 10:17:10
org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237
INFO: Total input paths to process : 1
2012/12/17 10:17:10
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288
INFO: Running job: job_local_0001
2012/12/17 10:17:10 org.apache.hadoop.util.ProcessTree#isSetsidSupported:63
INFO: setsid exited with exit code 0
2012/12/17 10:17:10 org.apache.hadoop.mapred.Task#initialize:534
INFO:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1f5b4d1
2012/12/17 10:17:10
org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:944
INFO: io.sort.mb = 100
2012/12/17 10:17:10
org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:956
INFO: data buffer = 79691776/99614720
2012/12/17 10:17:10
org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:957
INFO: record buffer = 262144/327680
2012/12/17 10:17:11
org.apache.hadoop.mapred.MapTask$MapOutputBuffer#flush:1284
INFO: Starting flush of map output
2012/12/17 10:17:11
org.apache.hadoop.mapred.MapTask$MapOutputBuffer#sortAndSpill:1466
INFO: Finished spill 0
2012/12/17 10:17:11 org.apache.hadoop.mapred.Task#done:847
INFO: Task:attempt_local_0001_m_000000_0 is done. And is in the process of
commiting
2012/12/17 10:17:11
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
INFO:  map 0% reduce 0%
2012/12/17 10:17:13
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO:
2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#sendDone:959
INFO: Task 'attempt_local_0001_m_000000_0' done.
2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#initialize:534
INFO:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3e926
2012/12/17 10:17:13
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO:
2012/12/17 10:17:13 org.apache.hadoop.mapred.Merger$MergeQueue#merge:390
INFO: Merging 1 sorted segments
2012/12/17 10:17:13 org.apache.hadoop.mapred.Merger$MergeQueue#merge:473
INFO: Down to the last merge-pass, with 1 segments left of total size:
414560 bytes
2012/12/17 10:17:13
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO:
2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#done:847
INFO: Task:attempt_local_0001_r_000000_0 is done. And is in the process of
commiting
2012/12/17 10:17:13
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO:
2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#commit:1000
INFO: Task attempt_local_0001_r_000000_0 is allowed to commit now
2012/12/17 10:17:13
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173
INFO: Saved output of task 'attempt_local_0001_r_000000_0' to
data/dirichlet/clusters-1
2012/12/17 10:17:14
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
INFO:  map 100% reduce 0%
2012/12/17 10:17:16
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO: reduce > reduce
2012/12/17 10:17:16 org.apache.hadoop.mapred.Task#sendDone:959
INFO: Task 'attempt_local_0001_r_000000_0' done.
2012/12/17 10:17:17
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
INFO:  map 100% reduce 100%
2012/12/17 10:17:17
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356
INFO: Job complete: job_local_0001
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:585
INFO: Counters: 20
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
INFO:   File Output Format Counters
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Bytes Written=379153
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
INFO:   FileSystemCounters
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     FILE_BYTES_READ=4679083
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     FILE_BYTES_WRITTEN=5169961
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
INFO:   File Input Format Counters
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Bytes Read=29486
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
INFO:   Map-Reduce Framework
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Map output materialized bytes=414564
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Map input records=37
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Reduce shuffle bytes=0
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Spilled Records=20
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Map output bytes=414518
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Total committed heap usage (bytes)=358350848
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     CPU time spent (ms)=0
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     SPLIT_RAW_BYTES=121
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Combine input records=0
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Reduce input records=10
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Reduce input groups=10
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Combine output records=0
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Physical memory (bytes) snapshot=0
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Reduce output records=10
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Virtual memory (bytes) snapshot=0
2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
INFO:     Map output records=10
Cluster Iterator running iteration 2 over priorPath:
data/dirichlet/clusters-1
2012/12/17 10:17:17
org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667
WARN: Use GenericOptionsParser for parsing the arguments. Applications
should implement Tool for the same.
2012/12/17 10:17:17
org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237
INFO: Total input paths to process : 1
2012/12/17 10:17:17
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288
INFO: Running job: job_local_0002
2012/12/17 10:17:17 org.apache.hadoop.mapred.Task#initialize:534
INFO:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@423d4f
2012/12/17 10:17:17
org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:944
INFO: io.sort.mb = 100
2012/12/17 10:17:17
org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:956
INFO: data buffer = 79691776/99614720
2012/12/17 10:17:17
org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:957
INFO: record buffer = 262144/327680
2012/12/17 10:17:17
org.apache.hadoop.mapred.MapTask$MapOutputBuffer#flush:1284
INFO: Starting flush of map output
2012/12/17 10:17:17
org.apache.hadoop.mapred.MapTask$MapOutputBuffer#sortAndSpill:1466
INFO: Finished spill 0
2012/12/17 10:17:17 org.apache.hadoop.mapred.Task#done:847
INFO: Task:attempt_local_0002_m_000000_0 is done. And is in the process of
commiting
2012/12/17 10:17:18
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
INFO:  map 0% reduce 0%
2012/12/17 10:17:20
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO:
2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#sendDone:959
INFO: Task 'attempt_local_0002_m_000000_0' done.
2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#initialize:534
INFO:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1a32ea4
2012/12/17 10:17:20
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO:
2012/12/17 10:17:20 org.apache.hadoop.mapred.Merger$MergeQueue#merge:390
INFO: Merging 1 sorted segments
2012/12/17 10:17:20 org.apache.hadoop.mapred.Merger$MergeQueue#merge:473
INFO: Down to the last merge-pass, with 1 segments left of total size:
402422 bytes
2012/12/17 10:17:20
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO:
2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#done:847
INFO: Task:attempt_local_0002_r_000000_0 is done. And is in the process of
commiting
2012/12/17 10:17:20
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO:
2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#commit:1000
INFO: Task attempt_local_0002_r_000000_0 is allowed to commit now
2012/12/17 10:17:20
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173
INFO: Saved output of task 'attempt_local_0002_r_000000_0' to
data/dirichlet/clusters-2
2012/12/17 10:17:21
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
INFO:  map 100% reduce 0%
2012/12/17 10:17:23
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO: reduce > reduce
2012/12/17 10:17:23 org.apache.hadoop.mapred.Task#sendDone:959
INFO: Task 'attempt_local_0002_r_000000_0' done.
2012/12/17 10:17:24
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
INFO:  map 100% reduce 100%
2012/12/17 10:17:24
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356
INFO: Job complete: job_local_0002
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:585
INFO: Counters: 20
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
INFO:   File Output Format Counters
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Bytes Written=379153
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
INFO:   FileSystemCounters
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     FILE_BYTES_READ=10176320
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     FILE_BYTES_WRITTEN=9851171
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
INFO:   File Input Format Counters
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Bytes Read=29486
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
INFO:   Map-Reduce Framework
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Map output materialized bytes=402426
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Map input records=37
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Reduce shuffle bytes=0
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Spilled Records=20
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Map output bytes=402380
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Total committed heap usage (bytes)=595066880
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     CPU time spent (ms)=0
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     SPLIT_RAW_BYTES=121
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Combine input records=0
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Reduce input records=10
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Reduce input groups=10
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Combine output records=0
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Physical memory (bytes) snapshot=0
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Reduce output records=10
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Virtual memory (bytes) snapshot=0
2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
INFO:     Map output records=10
2012/12/17 10:17:24
org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667
WARN: Use GenericOptionsParser for parsing the arguments. Applications
should implement Tool for the same.
2012/12/17 10:17:24
org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237
INFO: Total input paths to process : 1
2012/12/17 10:17:24
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288
INFO: Running job: job_local_0003
2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#initialize:534
INFO:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@14d581b
2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#done:847
INFO: Task:attempt_local_0003_m_000000_0 is done. And is in the process of
commiting
2012/12/17 10:17:24
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO:
2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#commit:1000
INFO: Task attempt_local_0003_m_000000_0 is allowed to commit now
2012/12/17 10:17:24
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173
INFO: Saved output of task 'attempt_local_0003_m_000000_0' to
data/dirichlet/clusteredPoints
2012/12/17 10:17:25
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
INFO:  map 0% reduce 0%
2012/12/17 10:17:27
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO:
2012/12/17 10:17:27 org.apache.hadoop.mapred.Task#sendDone:959
INFO: Task 'attempt_local_0003_m_000000_0' done.
2012/12/17 10:17:28
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
INFO:  map 100% reduce 0%
2012/12/17 10:17:28
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356
INFO: Job complete: job_local_0003
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:585
INFO: Counters: 12
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
INFO:   File Output Format Counters
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
INFO:     Bytes Written=132
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
INFO:   File Input Format Counters
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
INFO:     Bytes Read=29486
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
INFO:   FileSystemCounters
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
INFO:     FILE_BYTES_READ=7433181
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
INFO:     FILE_BYTES_WRITTEN=6674179
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
INFO:   Map-Reduce Framework
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
INFO:     Map input records=37
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
INFO:     Physical memory (bytes) snapshot=0
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
INFO:     Spilled Records=0
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
INFO:     Total committed heap usage (bytes)=297533440
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
INFO:     CPU time spent (ms)=0
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
INFO:     Virtual memory (bytes) snapshot=0
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
INFO:     SPLIT_RAW_BYTES=121
2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
INFO:     Map output records=0


Thanks

Yoshihiro

Re: Blank output at Dirichlet Process Clustering

Posted by yoshihiro fujimoto <yo...@gmail.com>.

I've tried again on setting the emitMostLikely is true.
But, it don't change the result.

= src (

DirichletDriver.run(conf,
  new Path("data/vector/vector.seq"),
 new Path("data/dirichlet"),
  new DistributionDescription(GaussianClusterDistribution.class.getName(),
 RandomAccessSparseVector.class.getName(),
 EuclideanDistanceMeasure.class.getName(),
  37), 10, 2, 0.1, true, true, 0.1, false);


= log ( last iteration only)

INFO: Running job: job_local_0004
2012/12/19 10:32:33 org.apache.hadoop.mapred.Task#initialize:534
INFO:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@cd2192
2012/12/19 10:32:33 org.apache.hadoop.mapred.Task#done:847
INFO: Task:attempt_local_0004_m_000000_0 is done. And is in the process of
commiting
2012/12/19 10:32:33
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO:
2012/12/19 10:32:33 org.apache.hadoop.mapred.Task#commit:1000
INFO: Task attempt_local_0004_m_000000_0 is allowed to commit now
2012/12/19 10:32:33
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173
INFO: Saved output of task 'attempt_local_0004_m_000000_0' to
data/dirichlet/clusteredPoints
2012/12/19 10:32:34
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
INFO:  map 0% reduce 0%
2012/12/19 10:32:36
org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
INFO:
2012/12/19 10:32:36 org.apache.hadoop.mapred.Task#sendDone:959
INFO: Task 'attempt_local_0004_m_000000_0' done.
2012/12/19 10:32:37
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
INFO:  map 100% reduce 0%
2012/12/19 10:32:37
org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356
INFO: Job complete: job_local_0004
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:585
INFO: Counters: 12
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:587
INFO:   File Output Format Counters
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:589
INFO:     Bytes Written=132
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:587
INFO:   File Input Format Counters
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:589
INFO:     Bytes Read=29486
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:587
INFO:   FileSystemCounters
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:589
INFO:     FILE_BYTES_READ=10169137
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:589
INFO:     FILE_BYTES_WRITTEN=9014784
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:587
INFO:   Map-Reduce Framework
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:589
INFO:     Map input records=37
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:589
INFO:     Physical memory (bytes) snapshot=0
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:589
INFO:     Spilled Records=0
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:589
INFO:     Total committed heap usage (bytes)=183042048
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:589
INFO:     CPU time spent (ms)=0
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:589
INFO:     Virtual memory (bytes) snapshot=0
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:589
INFO:     SPLIT_RAW_BYTES=121
2012/12/19 10:32:37 org.apache.hadoop.mapred.Counters#log:589
INFO:     Map output records=0

Thanks,

Yoshihiro

2012/12/19 praneet mhatre <pr...@gmail.com>

> Did you try setting the emitMostLikely option of the DirichletDriver to
> true?
>
>
> On Tue, Dec 18, 2012 at 2:38 AM, yoshihiro fujimoto <
> yoshihiro.0906@gmail.com> wrote:
>
> > Hi,
> >
> > I've used mahout at Dirichlet Process Clustering.
> > Input records is 37 ,but output records is 0.
> > In the case of 1800 records, output is normal(output records is 1800).
> >
> > What are your suggestions to solve this problem?
> >
> > == java code( the case of 37 records, and use mahout-core-0.7.jar,
> > mahout-math-0.7.jar)
> >
> > DirichletDriver.run(conf,
> >  new Path("data/vector/vector.seq"),
> > new Path("data/dirichlet"),
> >  new DistributionDescription(GaussianClusterDistribution.class.getName(),
> > RandomAccessSparseVector.class.getName(),
> > EuclideanDistanceMeasure.class.getName(),
> >  37),
> > 10, 2, 0.1, true, false, 0.1, false);
> >
> > == log ( the case of 37 records)
> >
> >
> > 2012/12/17 10:17:09 org.apache.hadoop.util.NativeCodeLoader#<clinit>:52
> > WARN: Unable to load native-hadoop library for your platform... using
> > builtin-java classes where applicable
> > Cluster Iterator running iteration 1 over priorPath:
> > data/dirichlet/clusters-0
> > 2012/12/17 10:17:10
> > org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667
> > WARN: Use GenericOptionsParser for parsing the arguments. Applications
> > should implement Tool for the same.
> > 2012/12/17 10:17:10
> > org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237
> > INFO: Total input paths to process : 1
> > 2012/12/17 10:17:10
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288
> > INFO: Running job: job_local_0001
> > 2012/12/17 10:17:10
> org.apache.hadoop.util.ProcessTree#isSetsidSupported:63
> > INFO: setsid exited with exit code 0
> > 2012/12/17 10:17:10 org.apache.hadoop.mapred.Task#initialize:534
> > INFO:  Using ResourceCalculatorPlugin :
> > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1f5b4d1
> > 2012/12/17 10:17:10
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:944
> > INFO: io.sort.mb = 100
> > 2012/12/17 10:17:10
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:956
> > INFO: data buffer = 79691776/99614720
> > 2012/12/17 10:17:10
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:957
> > INFO: record buffer = 262144/327680
> > 2012/12/17 10:17:11
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#flush:1284
> > INFO: Starting flush of map output
> > 2012/12/17 10:17:11
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#sortAndSpill:1466
> > INFO: Finished spill 0
> > 2012/12/17 10:17:11 org.apache.hadoop.mapred.Task#done:847
> > INFO: Task:attempt_local_0001_m_000000_0 is done. And is in the process
> of
> > commiting
> > 2012/12/17 10:17:11
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> > INFO:  map 0% reduce 0%
> > 2012/12/17 10:17:13
> > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> > INFO:
> > 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#sendDone:959
> > INFO: Task 'attempt_local_0001_m_000000_0' done.
> > 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#initialize:534
> > INFO:  Using ResourceCalculatorPlugin :
> > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3e926
> > 2012/12/17 10:17:13
> > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> > INFO:
> > 2012/12/17 10:17:13 org.apache.hadoop.mapred.Merger$MergeQueue#merge:390
> > INFO: Merging 1 sorted segments
> > 2012/12/17 10:17:13 org.apache.hadoop.mapred.Merger$MergeQueue#merge:473
> > INFO: Down to the last merge-pass, with 1 segments left of total size:
> > 414560 bytes
> > 2012/12/17 10:17:13
> > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> > INFO:
> > 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#done:847
> > INFO: Task:attempt_local_0001_r_000000_0 is done. And is in the process
> of
> > commiting
> > 2012/12/17 10:17:13
> > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> > INFO:
> > 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#commit:1000
> > INFO: Task attempt_local_0001_r_000000_0 is allowed to commit now
> > 2012/12/17 10:17:13
> > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173
> > INFO: Saved output of task 'attempt_local_0001_r_000000_0' to
> > data/dirichlet/clusters-1
> > 2012/12/17 10:17:14
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> > INFO:  map 100% reduce 0%
> > 2012/12/17 10:17:16
> > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> > INFO: reduce > reduce
> > 2012/12/17 10:17:16 org.apache.hadoop.mapred.Task#sendDone:959
> > INFO: Task 'attempt_local_0001_r_000000_0' done.
> > 2012/12/17 10:17:17
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> > INFO:  map 100% reduce 100%
> > 2012/12/17 10:17:17
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356
> > INFO: Job complete: job_local_0001
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:585
> > INFO: Counters: 20
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
> > INFO:   File Output Format Counters
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Bytes Written=379153
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
> > INFO:   FileSystemCounters
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     FILE_BYTES_READ=4679083
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     FILE_BYTES_WRITTEN=5169961
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
> > INFO:   File Input Format Counters
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Bytes Read=29486
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
> > INFO:   Map-Reduce Framework
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Map output materialized bytes=414564
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Map input records=37
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Reduce shuffle bytes=0
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Spilled Records=20
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Map output bytes=414518
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Total committed heap usage (bytes)=358350848
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     CPU time spent (ms)=0
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     SPLIT_RAW_BYTES=121
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Combine input records=0
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Reduce input records=10
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Reduce input groups=10
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Combine output records=0
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Physical memory (bytes) snapshot=0
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Reduce output records=10
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Virtual memory (bytes) snapshot=0
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Map output records=10
> > Cluster Iterator running iteration 2 over priorPath:
> > data/dirichlet/clusters-1
> > 2012/12/17 10:17:17
> > org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667
> > WARN: Use GenericOptionsParser for parsing the arguments. Applications
> > should implement Tool for the same.
> > 2012/12/17 10:17:17
> > org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237
> > INFO: Total input paths to process : 1
> > 2012/12/17 10:17:17
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288
> > INFO: Running job: job_local_0002
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Task#initialize:534
> > INFO:  Using ResourceCalculatorPlugin :
> > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@423d4f
> > 2012/12/17 10:17:17
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:944
> > INFO: io.sort.mb = 100
> > 2012/12/17 10:17:17
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:956
> > INFO: data buffer = 79691776/99614720
> > 2012/12/17 10:17:17
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:957
> > INFO: record buffer = 262144/327680
> > 2012/12/17 10:17:17
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#flush:1284
> > INFO: Starting flush of map output
> > 2012/12/17 10:17:17
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer#sortAndSpill:1466
> > INFO: Finished spill 0
> > 2012/12/17 10:17:17 org.apache.hadoop.mapred.Task#done:847
> > INFO: Task:attempt_local_0002_m_000000_0 is done. And is in the process
> of
> > commiting
> > 2012/12/17 10:17:18
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> > INFO:  map 0% reduce 0%
> > 2012/12/17 10:17:20
> > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> > INFO:
> > 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#sendDone:959
> > INFO: Task 'attempt_local_0002_m_000000_0' done.
> > 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#initialize:534
> > INFO:  Using ResourceCalculatorPlugin :
> > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1a32ea4
> > 2012/12/17 10:17:20
> > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> > INFO:
> > 2012/12/17 10:17:20 org.apache.hadoop.mapred.Merger$MergeQueue#merge:390
> > INFO: Merging 1 sorted segments
> > 2012/12/17 10:17:20 org.apache.hadoop.mapred.Merger$MergeQueue#merge:473
> > INFO: Down to the last merge-pass, with 1 segments left of total size:
> > 402422 bytes
> > 2012/12/17 10:17:20
> > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> > INFO:
> > 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#done:847
> > INFO: Task:attempt_local_0002_r_000000_0 is done. And is in the process
> of
> > commiting
> > 2012/12/17 10:17:20
> > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> > INFO:
> > 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#commit:1000
> > INFO: Task attempt_local_0002_r_000000_0 is allowed to commit now
> > 2012/12/17 10:17:20
> > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173
> > INFO: Saved output of task 'attempt_local_0002_r_000000_0' to
> > data/dirichlet/clusters-2
> > 2012/12/17 10:17:21
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> > INFO:  map 100% reduce 0%
> > 2012/12/17 10:17:23
> > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> > INFO: reduce > reduce
> > 2012/12/17 10:17:23 org.apache.hadoop.mapred.Task#sendDone:959
> > INFO: Task 'attempt_local_0002_r_000000_0' done.
> > 2012/12/17 10:17:24
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> > INFO:  map 100% reduce 100%
> > 2012/12/17 10:17:24
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356
> > INFO: Job complete: job_local_0002
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:585
> > INFO: Counters: 20
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
> > INFO:   File Output Format Counters
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Bytes Written=379153
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
> > INFO:   FileSystemCounters
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     FILE_BYTES_READ=10176320
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     FILE_BYTES_WRITTEN=9851171
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
> > INFO:   File Input Format Counters
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Bytes Read=29486
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
> > INFO:   Map-Reduce Framework
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Map output materialized bytes=402426
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Map input records=37
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Reduce shuffle bytes=0
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Spilled Records=20
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Map output bytes=402380
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Total committed heap usage (bytes)=595066880
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     CPU time spent (ms)=0
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     SPLIT_RAW_BYTES=121
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Combine input records=0
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Reduce input records=10
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Reduce input groups=10
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Combine output records=0
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Physical memory (bytes) snapshot=0
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Reduce output records=10
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Virtual memory (bytes) snapshot=0
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Map output records=10
> > 2012/12/17 10:17:24
> > org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667
> > WARN: Use GenericOptionsParser for parsing the arguments. Applications
> > should implement Tool for the same.
> > 2012/12/17 10:17:24
> > org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237
> > INFO: Total input paths to process : 1
> > 2012/12/17 10:17:24
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288
> > INFO: Running job: job_local_0003
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#initialize:534
> > INFO:  Using ResourceCalculatorPlugin :
> > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@14d581b
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#done:847
> > INFO: Task:attempt_local_0003_m_000000_0 is done. And is in the process
> of
> > commiting
> > 2012/12/17 10:17:24
> > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> > INFO:
> > 2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#commit:1000
> > INFO: Task attempt_local_0003_m_000000_0 is allowed to commit now
> > 2012/12/17 10:17:24
> > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173
> > INFO: Saved output of task 'attempt_local_0003_m_000000_0' to
> > data/dirichlet/clusteredPoints
> > 2012/12/17 10:17:25
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> > INFO:  map 0% reduce 0%
> > 2012/12/17 10:17:27
> > org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> > INFO:
> > 2012/12/17 10:17:27 org.apache.hadoop.mapred.Task#sendDone:959
> > INFO: Task 'attempt_local_0003_m_000000_0' done.
> > 2012/12/17 10:17:28
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> > INFO:  map 100% reduce 0%
> > 2012/12/17 10:17:28
> > org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356
> > INFO: Job complete: job_local_0003
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:585
> > INFO: Counters: 12
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
> > INFO:   File Output Format Counters
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Bytes Written=132
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
> > INFO:   File Input Format Counters
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Bytes Read=29486
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
> > INFO:   FileSystemCounters
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     FILE_BYTES_READ=7433181
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     FILE_BYTES_WRITTEN=6674179
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
> > INFO:   Map-Reduce Framework
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Map input records=37
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Physical memory (bytes) snapshot=0
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Spilled Records=0
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Total committed heap usage (bytes)=297533440
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     CPU time spent (ms)=0
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Virtual memory (bytes) snapshot=0
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     SPLIT_RAW_BYTES=121
> > 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> > INFO:     Map output records=0
> >
> >
> > Thanks
> >
> > Yoshihiro
> >
>
>
>
> --
> Praneet Mhatre
> Graduate Student
> Donald Bren School of ICS
> University of California, Irvine
>

Re: Blank output at Dirichlet Process Clustering

Posted by praneet mhatre <pr...@gmail.com>.

Did you try setting the emitMostLikely option of the DirichletDriver to
true?


On Tue, Dec 18, 2012 at 2:38 AM, yoshihiro fujimoto <
yoshihiro.0906@gmail.com> wrote:

> Hi,
>
> I've used mahout at Dirichlet Process Clustering.
> Input records is 37 ,but output records is 0.
> In the case of 1800 records, output is normal(output records is 1800).
>
> What are your suggestions to solve this problem?
>
> == java code( the case of 37 records, and use mahout-core-0.7.jar,
> mahout-math-0.7.jar)
>
> DirichletDriver.run(conf,
>  new Path("data/vector/vector.seq"),
> new Path("data/dirichlet"),
>  new DistributionDescription(GaussianClusterDistribution.class.getName(),
> RandomAccessSparseVector.class.getName(),
> EuclideanDistanceMeasure.class.getName(),
>  37),
> 10, 2, 0.1, true, false, 0.1, false);
>
> == log ( the case of 37 records)
>
>
> 2012/12/17 10:17:09 org.apache.hadoop.util.NativeCodeLoader#<clinit>:52
> WARN: Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> Cluster Iterator running iteration 1 over priorPath:
> data/dirichlet/clusters-0
> 2012/12/17 10:17:10
> org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667
> WARN: Use GenericOptionsParser for parsing the arguments. Applications
> should implement Tool for the same.
> 2012/12/17 10:17:10
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237
> INFO: Total input paths to process : 1
> 2012/12/17 10:17:10
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288
> INFO: Running job: job_local_0001
> 2012/12/17 10:17:10 org.apache.hadoop.util.ProcessTree#isSetsidSupported:63
> INFO: setsid exited with exit code 0
> 2012/12/17 10:17:10 org.apache.hadoop.mapred.Task#initialize:534
> INFO:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1f5b4d1
> 2012/12/17 10:17:10
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:944
> INFO: io.sort.mb = 100
> 2012/12/17 10:17:10
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:956
> INFO: data buffer = 79691776/99614720
> 2012/12/17 10:17:10
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:957
> INFO: record buffer = 262144/327680
> 2012/12/17 10:17:11
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#flush:1284
> INFO: Starting flush of map output
> 2012/12/17 10:17:11
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#sortAndSpill:1466
> INFO: Finished spill 0
> 2012/12/17 10:17:11 org.apache.hadoop.mapred.Task#done:847
> INFO: Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> commiting
> 2012/12/17 10:17:11
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 0% reduce 0%
> 2012/12/17 10:17:13
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#sendDone:959
> INFO: Task 'attempt_local_0001_m_000000_0' done.
> 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#initialize:534
> INFO:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3e926
> 2012/12/17 10:17:13
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:13 org.apache.hadoop.mapred.Merger$MergeQueue#merge:390
> INFO: Merging 1 sorted segments
> 2012/12/17 10:17:13 org.apache.hadoop.mapred.Merger$MergeQueue#merge:473
> INFO: Down to the last merge-pass, with 1 segments left of total size:
> 414560 bytes
> 2012/12/17 10:17:13
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#done:847
> INFO: Task:attempt_local_0001_r_000000_0 is done. And is in the process of
> commiting
> 2012/12/17 10:17:13
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:13 org.apache.hadoop.mapred.Task#commit:1000
> INFO: Task attempt_local_0001_r_000000_0 is allowed to commit now
> 2012/12/17 10:17:13
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173
> INFO: Saved output of task 'attempt_local_0001_r_000000_0' to
> data/dirichlet/clusters-1
> 2012/12/17 10:17:14
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 100% reduce 0%
> 2012/12/17 10:17:16
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO: reduce > reduce
> 2012/12/17 10:17:16 org.apache.hadoop.mapred.Task#sendDone:959
> INFO: Task 'attempt_local_0001_r_000000_0' done.
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 100% reduce 100%
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356
> INFO: Job complete: job_local_0001
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:585
> INFO: Counters: 20
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
> INFO:   File Output Format Counters
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Bytes Written=379153
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
> INFO:   FileSystemCounters
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     FILE_BYTES_READ=4679083
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     FILE_BYTES_WRITTEN=5169961
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
> INFO:   File Input Format Counters
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Bytes Read=29486
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:587
> INFO:   Map-Reduce Framework
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output materialized bytes=414564
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map input records=37
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce shuffle bytes=0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Spilled Records=20
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output bytes=414518
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Total committed heap usage (bytes)=358350848
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     CPU time spent (ms)=0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     SPLIT_RAW_BYTES=121
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Combine input records=0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce input records=10
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce input groups=10
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Combine output records=0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Physical memory (bytes) snapshot=0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce output records=10
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Virtual memory (bytes) snapshot=0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output records=10
> Cluster Iterator running iteration 2 over priorPath:
> data/dirichlet/clusters-1
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667
> WARN: Use GenericOptionsParser for parsing the arguments. Applications
> should implement Tool for the same.
> 2012/12/17 10:17:17
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237
> INFO: Total input paths to process : 1
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288
> INFO: Running job: job_local_0002
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Task#initialize:534
> INFO:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@423d4f
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:944
> INFO: io.sort.mb = 100
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:956
> INFO: data buffer = 79691776/99614720
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#<init>:957
> INFO: record buffer = 262144/327680
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#flush:1284
> INFO: Starting flush of map output
> 2012/12/17 10:17:17
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer#sortAndSpill:1466
> INFO: Finished spill 0
> 2012/12/17 10:17:17 org.apache.hadoop.mapred.Task#done:847
> INFO: Task:attempt_local_0002_m_000000_0 is done. And is in the process of
> commiting
> 2012/12/17 10:17:18
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 0% reduce 0%
> 2012/12/17 10:17:20
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#sendDone:959
> INFO: Task 'attempt_local_0002_m_000000_0' done.
> 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#initialize:534
> INFO:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1a32ea4
> 2012/12/17 10:17:20
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:20 org.apache.hadoop.mapred.Merger$MergeQueue#merge:390
> INFO: Merging 1 sorted segments
> 2012/12/17 10:17:20 org.apache.hadoop.mapred.Merger$MergeQueue#merge:473
> INFO: Down to the last merge-pass, with 1 segments left of total size:
> 402422 bytes
> 2012/12/17 10:17:20
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#done:847
> INFO: Task:attempt_local_0002_r_000000_0 is done. And is in the process of
> commiting
> 2012/12/17 10:17:20
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:20 org.apache.hadoop.mapred.Task#commit:1000
> INFO: Task attempt_local_0002_r_000000_0 is allowed to commit now
> 2012/12/17 10:17:20
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173
> INFO: Saved output of task 'attempt_local_0002_r_000000_0' to
> data/dirichlet/clusters-2
> 2012/12/17 10:17:21
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 100% reduce 0%
> 2012/12/17 10:17:23
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO: reduce > reduce
> 2012/12/17 10:17:23 org.apache.hadoop.mapred.Task#sendDone:959
> INFO: Task 'attempt_local_0002_r_000000_0' done.
> 2012/12/17 10:17:24
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 100% reduce 100%
> 2012/12/17 10:17:24
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356
> INFO: Job complete: job_local_0002
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:585
> INFO: Counters: 20
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
> INFO:   File Output Format Counters
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Bytes Written=379153
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
> INFO:   FileSystemCounters
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     FILE_BYTES_READ=10176320
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     FILE_BYTES_WRITTEN=9851171
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
> INFO:   File Input Format Counters
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Bytes Read=29486
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:587
> INFO:   Map-Reduce Framework
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output materialized bytes=402426
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map input records=37
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce shuffle bytes=0
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Spilled Records=20
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output bytes=402380
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Total committed heap usage (bytes)=595066880
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     CPU time spent (ms)=0
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     SPLIT_RAW_BYTES=121
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Combine input records=0
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce input records=10
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce input groups=10
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Combine output records=0
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Physical memory (bytes) snapshot=0
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Reduce output records=10
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Virtual memory (bytes) snapshot=0
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output records=10
> 2012/12/17 10:17:24
> org.apache.hadoop.mapred.JobClient#copyAndConfigureFiles:667
> WARN: Use GenericOptionsParser for parsing the arguments. Applications
> should implement Tool for the same.
> 2012/12/17 10:17:24
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat#listStatus:237
> INFO: Total input paths to process : 1
> 2012/12/17 10:17:24
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1288
> INFO: Running job: job_local_0003
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#initialize:534
> INFO:  Using ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@14d581b
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#done:847
> INFO: Task:attempt_local_0003_m_000000_0 is done. And is in the process of
> commiting
> 2012/12/17 10:17:24
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:24 org.apache.hadoop.mapred.Task#commit:1000
> INFO: Task attempt_local_0003_m_000000_0 is allowed to commit now
> 2012/12/17 10:17:24
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter#commitTask:173
> INFO: Saved output of task 'attempt_local_0003_m_000000_0' to
> data/dirichlet/clusteredPoints
> 2012/12/17 10:17:25
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 0% reduce 0%
> 2012/12/17 10:17:27
> org.apache.hadoop.mapred.LocalJobRunner$Job#statusUpdate:321
> INFO:
> 2012/12/17 10:17:27 org.apache.hadoop.mapred.Task#sendDone:959
> INFO: Task 'attempt_local_0003_m_000000_0' done.
> 2012/12/17 10:17:28
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1301
> INFO:  map 100% reduce 0%
> 2012/12/17 10:17:28
> org.apache.hadoop.mapred.JobClient#monitorAndPrintJob:1356
> INFO: Job complete: job_local_0003
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:585
> INFO: Counters: 12
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
> INFO:   File Output Format Counters
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Bytes Written=132
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
> INFO:   File Input Format Counters
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Bytes Read=29486
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
> INFO:   FileSystemCounters
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     FILE_BYTES_READ=7433181
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     FILE_BYTES_WRITTEN=6674179
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:587
> INFO:   Map-Reduce Framework
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map input records=37
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Physical memory (bytes) snapshot=0
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Spilled Records=0
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Total committed heap usage (bytes)=297533440
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     CPU time spent (ms)=0
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Virtual memory (bytes) snapshot=0
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     SPLIT_RAW_BYTES=121
> 2012/12/17 10:17:28 org.apache.hadoop.mapred.Counters#log:589
> INFO:     Map output records=0
>
>
> Thanks
>
> Yoshihiro
>



-- 
Praneet Mhatre
Graduate Student
Donald Bren School of ICS
University of California, Irvine