You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Jack Pay (JIRA)" <ji...@apache.org> on 2013/02/03 22:18:12 UTC

[jira] [Created] (MAHOUT-1147) CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix

Jack Pay created MAHOUT-1147:
--------------------------------

             Summary: CVB Bug in CVB0Driver causes doc/topic distributions to be trained on random matrix
                 Key: MAHOUT-1147
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1147
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.7
         Environment: Eclipse IDE
Java code base
CVB0Driver Class
setModelPaths(Job job, Path modelPath) - method
            Reporter: Jack Pay
             Fix For: 0.7


Problem:
When training doc/topic model no paths for the term/topic model found (outputs null).
These paths are set using setModelPaths in CVB0Driver.


Reason for Problem:
Variety of Job instances call this method. 
The Job is passed to the method instead of the Configuration object given to the Job.
The configuration is retrieved from the Job instance itself.
I believe that this Configuration instance is a clone of the original.
This is a problem as the variable MODEL_PATHS is set on the clone which is then discarded when the given Job is complete.
The original Configuration has no MODEL_PATHS String set and therefore returns null.
The code stipulates that if it cannot find a model to use a new random matrix. This happens every time as MODEL_PATHS is not set for the Configuration instance used.

Solution:
Do not pass the Job to the setModels method, but pass the Configuration instance passed into the method which created the Job.
i.e.
change from:
setModelPaths(Job job, Path modelPath)

to:
setModelPaths(Configuration conf, Path modelPath)

And change all calling methods accordingly (obviously).

So far what little testing I have done appears to solve this problem.
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira