You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Sebastian Benthall <sb...@gmail.com> on 2012/03/19 02:35:21 UTC

IllegalArgumentExceptions from LDA

Hi all,

I'm trying to use Mahout for LDA but have been getting
IllegalArgumentExceptions like this: https://gist.github.com/2089285

I'm using this script that is based on the cluster-reuters.sh example:
https://gist.github.com/2088888

I've poked around a bit without success but maybe this warning is an
indication of what's wrong:

WARN lda.LDADriver: can't determine number of words; no vectors in
mahout-work-hduser/toy-seqdir-sparse-lda/tf-vectors

Prior to that, I see that I get this message:
INFO common.HadoopUtil: Deleting
mahout-work-hduser/toy-seqdir-sparse-lda/tf-vectors

which I guess explains why there would be no vectors in that directory.

Is this expected behavior?

The only other thing I can think of is based on the comment of this method
that seems to be generating the warning.
http://mail-archives.apache.org/mod_mbox/mahout-commits/201109.mbox/%3C20110930114400.9DC732388A02@eris.apache.org%3E

Is the problem that I don't have enough documents?  I was using a toy data
set before, and have been throwing other things into my document directory
to fill it out.  It hasn't solved the problem.

Thanks in advance,
Seb