You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Claudio Reggiani <no...@gmail.com> on 2013/02/08 19:52:11 UTC

Running CVB command

Hello,

following this tutorial
https://cwiki.apache.org/MAHOUT/dimensional-reduction.html

I have created successfully my matrix from Reuters data. Since I want to
test CVB algorithm I run this command

$MAHOUT_HOME/bin/mahout cvb -i reuters-vectors/tfidf-matrix/matrix -o
reuters-vectors/lda -k 5 -ow -nt 41807 --maxIter 1 -dict
reuters-vectors/dictionary.file-0 -dt reuters-vectors/lda-topic -mt
reuters-vectors/lda-temp

Note:
- I setup --maxIter 1 because I just want to see if I am able to run this
algorithm, with the proper parameters.

The job get stuck and I don't know why, which means that the job doesn't
finish and at the same it's not using CPU. Here it is the log's tail:

13/02/08 19:41:08 INFO mapred.JobClient:  map 99% reduce 0%
13/02/08 19:41:08 INFO cvb.ModelTrainer: Initiating stopping of training
threadpool
13/02/08 19:41:08 INFO cvb.ModelTrainer: threadpool took: 1.163836ms
13/02/08 19:41:08 INFO cvb.ModelTrainer: writeModel.awaitTermination() took
173.872662ms
13/02/08 19:41:08 INFO mapred.Task: Task:attempt_local_0003_m_000000_0 is
done. And is in the process of commiting
13/02/08 19:41:08 INFO mapred.LocalJobRunner:
13/02/08 19:41:08 INFO mapred.Task: Task attempt_local_0003_m_000000_0 is
allowed to commit now
13/02/08 19:41:08 INFO output.FileOutputCommitter: Saved output of task
'attempt_local_0003_m_000000_0' to reuters-vectors/lda-topic
13/02/08 19:41:11 INFO mapred.LocalJobRunner:
13/02/08 19:41:11 INFO mapred.LocalJobRunner:
13/02/08 19:41:11 INFO mapred.Task: Task 'attempt_local_0003_m_000000_0'
done.
13/02/08 19:41:11 INFO mapred.JobClient:  map 100% reduce 0%
13/02/08 19:41:11 INFO mapred.JobClient: Job complete: job_local_0003
13/02/08 19:41:11 INFO mapred.JobClient: Counters: 12
13/02/08 19:41:11 INFO mapred.JobClient:   File Output Format Counters
13/02/08 19:41:11 INFO mapred.JobClient:     Bytes Written=1185853
13/02/08 19:41:11 INFO mapred.JobClient:   File Input Format Counters
13/02/08 19:41:11 INFO mapred.JobClient:     Bytes Read=15326617
13/02/08 19:41:11 INFO mapred.JobClient:   FileSystemCounters
13/02/08 19:41:11 INFO mapred.JobClient:     FILE_BYTES_READ=124124017
13/02/08 19:41:11 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=97148688
13/02/08 19:41:11 INFO mapred.JobClient:   Map-Reduce Framework
13/02/08 19:41:11 INFO mapred.JobClient:     Map input records=21578
13/02/08 19:41:11 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=0
13/02/08 19:41:11 INFO mapred.JobClient:     Spilled Records=0
13/02/08 19:41:11 INFO mapred.JobClient:     Total committed heap usage
(bytes)=181207040
13/02/08 19:41:11 INFO mapred.JobClient:     CPU time spent (ms)=0
13/02/08 19:41:11 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=0
13/02/08 19:41:11 INFO mapred.JobClient:     SPLIT_RAW_BYTES=159
13/02/08 19:41:11 INFO mapred.JobClient:     Map output records=21578
13/02/08 19:41:11 INFO driver.MahoutDriver: Program took 190584 ms
(Minutes: 3.1764)

Any suggestion?

Thanks
Claudio

Re: Running CVB command

Posted by Wilson Chu <wi...@teltel.com>.
Jake Mannix <jake.mannix <at> gmail.com> writes:

> 
> If it ends with "total training time..." it's done.  The JVM isn't exiting,
> but I bet if you check your HDFS (or local FS if running without hadoop),
> you'll see it has already created and populated the output directories.
> 
> What version of Mahout are you running, are running on trunk?
> 

The result did write to output directory.
Just that the commamd did not quit so it can't run next command.

I pulled the trunk with svn co http://svn.apache.org/repos/asf/mahout/trunk

I just tried the new trunk today.  Same problem.
I also tried both java-7-openjdk-amd64 and java-6-openjdk-amd64.  No luck.
Here is my command line:

export MAHOUT_LOCAL=true
bin/mahout cvb0_local -i out/matrix/matrix \
 -d out/sparseVectors/dictionary.file-0 \
 -m 20 -a 0.5 -top 10 -do out/cvb/do_out -to out/cvb/to_out



Re: Running CVB command

Posted by Jake Mannix <ja...@gmail.com>.
If it ends with "total training time..." it's done.  The JVM isn't exiting,
but I bet if you check your HDFS (or local FS if running without hadoop),
you'll see it has already created and populated the output directories.

What version of Mahout are you running, are running on trunk?


On Thu, Feb 21, 2013 at 2:04 PM, Wilson Chu <wi...@teltel.com> wrote:

>
> >
> > The job get stuck and I don't know why, which means that the job doesn't
> > finish and at the same it's not using CPU. Here it is the log's tail:
> >
>
> I saw the same.  The command does not quit back to shell.  Not using CPU.
> After one day waiting still the same.
>
> ...
> INFO: total training time time: 22811.054701ms
> Feb 21, 2013 3:43:05 AM org.apache.hadoop.io.compress.CodecPool
> getCompressor
> INFO: Got brand-new compressor
> Feb 21, 2013 3:43:05 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: printTopics time: 323.275654ms
> Feb 21, 2013 3:43:05 AM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 25559 ms (Minutes: 0.4259833333333333)
>
> Anyone has clue on what is wrong?
>
> --Wilson
>
>
>
>
>


-- 

  -jake

Re: Running CVB command

Posted by Wilson Chu <wi...@teltel.com>.
> 
> The job get stuck and I don't know why, which means that the job doesn't
> finish and at the same it's not using CPU. Here it is the log's tail:
> 

I saw the same.  The command does not quit back to shell.  Not using CPU.
After one day waiting still the same.

...
INFO: total training time time: 22811.054701ms
Feb 21, 2013 3:43:05 AM org.apache.hadoop.io.compress.CodecPool getCompressor
INFO: Got brand-new compressor
Feb 21, 2013 3:43:05 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: printTopics time: 323.275654ms
Feb 21, 2013 3:43:05 AM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 25559 ms (Minutes: 0.4259833333333333)

Anyone has clue on what is wrong?

--Wilson