You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Jeff Eastman <jd...@windwardsolutions.com> on 2010/06/16 03:13:51 UTC

Testing Wikipedia?

I've made changes (patch in MAHOUT-167e.patch) to migrate the 
WikipediaDatasetCreatorEtc to 0.20.2 and the changes compile and the 
existing unit tests all run. But I had to port new 0.20 versions of 
MultipleOutputFormat and MultipleTextOutputFormat to do this and there 
are no unit tests for any of the wikipedia code in this package. 
Further, the code snippets to run the full example in the wiki 
(https://cwiki.apache.org/MAHOUT/wikipediabayesexample.html) are 
obsolete and build-deprecated.xml is no longer in trunk. This makes 
verifying the correctness of my port pretty difficult, for me at least 
since this is all unfamiliar code. What shall I do?

A. commit it, since the unit tests all run, and hope somebody else will 
verify the example
B. get help to run the example to verify it is correct, then commit it
C. leave the patch in jira and move on to utils

I'm loath to do A and would prefer to do B; however, C is what I'm going 
to have to do in the short term due to my schedule

Jeff