You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by vineeth <vi...@gmail.com> on 2012/10/18 07:11:31 UTC
mahout 0.5 to 0.7 commandline parameter of lda
Hello,
I am seeing from this website
http://theglassicon.com/computing/machine-learning/running-lda-algorithm-mahout
(Mahout 0.5). This website give the complete procedure to get
probabilities of word and topics using LDA. However, these steps donot
work on Mahout 0.7. Can some one give an updated website of the same
steps?, or can some one provide me the alternative commands and parameters?
Thank You
Vineeth
Re: mahout 0.5 to 0.7 commandline parameter of lda
Posted by Jake Mannix <ja...@gmail.com>.
On Thu, Oct 18, 2012 at 9:16 AM, Vineeth <vi...@gmail.com> wrote:
> I am running the lda for the first time. I gave the following command to
> test over the Reuters dataset but i got the error
>
> lda -i reuters-vectors/tf-vectors -o reuters-lda-sparse -k 10 -v 7000 -x
> 20 -ow
>
> hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_**PREFIX/bin, running
> locally
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/home/vineeth_**
> rakesh/src/mahout/examples/**target/mahout-examples-0.8-**
> SNAPSHOT-job.jar!/org/slf4j/**impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/home/vineeth_**
> rakesh/src/mahout/examples/**target/dependency/slf4j-jcl-1.**
> 6.6.jar!/org/slf4j/impl/**StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/home/vineeth_**
> rakesh/src/mahout/examples/**target/dependency/slf4j-**
> log4j12-1.6.1.jar!/org/slf4j/**impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.**html#multiple_bindings<http://www.slf4j.org/codes.html#multiple_bindings>for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.**Log4jLoggerFactory]
> 12/10/18 12:11:17 ERROR driver.MahoutDriver: : Try the new Collapsed
> Variation Bayes LDA, try bin/mahout cvb or bin/mahout cvb0_local
>
> As i mentioned this command seems to be for Mahout 0.5. Now if i have to
> use Collapsed Variation LDA how do you give the parameters? are there any
> websites describing the usage of CVB lda?
if you want a summary of all the command line options for CVB impl, just do:
mahout cvb
mahout cvb -i path/to/tf-vectors -o output_dir/lda_output -k <num_topics>
-x <num_iterations> -a <smoothing alpha param> -e <smoothing eta param>
-dict path/to/dictionary.file-0 -dt <"sequencefile" or "text">
--topic_model_temp_dir path/to/store/temp_state
num_iterations can be something like 20-30, and it's not too sensitive to
alpha or eta, but they should be pretty small (0.01 or so seems be the
right order of magnitude for both of them, often, but you have to play with
it, we don't learn the hyperparameters in this impl).
Let me know if that works for you.
>
> On 12-10-18 09:09 AM, Jake Mannix wrote:
>
>> For Mahout 0.7, the format of the model files for LDA are just a
>> SequenceFile<IntWritable, VectorWritable>, with the row numbers being the
>> topicIds, and the entries being the (un-normalized) probabilities for each
>> termId.
>>
>> bin/vectordump --dictionary <path to dictionary file> \
>> --dictioanryType <either text or sequencefile> \
>> --input <path to model files> \
>> --vectorSize <num entries per topic you want to
>> see> \
>> --sortVectors
>>
>>
>> On Wed, Oct 17, 2012 at 10:11 PM, vineeth <vi...@gmail.com>
>> wrote:
>>
>> Hello,
>>>
>>> I am seeing from this website http://theglassicon.com/**
>>> computing/machine-learning/****running-lda-algorithm-mahout<h**
>>> ttp://theglassicon.com/**computing/machine-learning/**
>>> running-lda-algorithm-mahout<http://theglassicon.com/computing/machine-learning/running-lda-algorithm-mahout>
>>> >(**Mahout 0.5). This website give the complete procedure to get
>>> probabilities
>>>
>>> of word and topics using LDA. However, these steps donot work on Mahout
>>> 0.7. Can some one give an updated website of the same steps?, or can some
>>> one provide me the alternative commands and parameters?
>>>
>>> Thank You
>>> Vineeth
>>>
>>>
>>
>>
>
--
-jake
Re: mahout 0.5 to 0.7 commandline parameter of lda
Posted by Vineeth <vi...@gmail.com>.
I am running the lda for the first time. I gave the following command to
test over the Reuters dataset but i got the error
lda -i reuters-vectors/tf-vectors -o reuters-lda-sparse -k 10 -v 7000 -x
20 -ow
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running
locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/vineeth_rakesh/src/mahout/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/vineeth_rakesh/src/mahout/examples/target/dependency/slf4j-jcl-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/vineeth_rakesh/src/mahout/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
12/10/18 12:11:17 ERROR driver.MahoutDriver: : Try the new Collapsed
Variation Bayes LDA, try bin/mahout cvb or bin/mahout cvb0_local
As i mentioned this command seems to be for Mahout 0.5. Now if i have to
use Collapsed Variation LDA how do you give the parameters? are there
any websites describing the usage of CVB lda?
On 12-10-18 09:09 AM, Jake Mannix wrote:
> For Mahout 0.7, the format of the model files for LDA are just a
> SequenceFile<IntWritable, VectorWritable>, with the row numbers being the
> topicIds, and the entries being the (un-normalized) probabilities for each
> termId.
>
> bin/vectordump --dictionary <path to dictionary file> \
> --dictioanryType <either text or sequencefile> \
> --input <path to model files> \
> --vectorSize <num entries per topic you want to
> see> \
> --sortVectors
>
>
> On Wed, Oct 17, 2012 at 10:11 PM, vineeth <vi...@gmail.com> wrote:
>
>> Hello,
>>
>> I am seeing from this website http://theglassicon.com/**
>> computing/machine-learning/**running-lda-algorithm-mahout<http://theglassicon.com/computing/machine-learning/running-lda-algorithm-mahout>(Mahout 0.5). This website give the complete procedure to get probabilities
>> of word and topics using LDA. However, these steps donot work on Mahout
>> 0.7. Can some one give an updated website of the same steps?, or can some
>> one provide me the alternative commands and parameters?
>>
>> Thank You
>> Vineeth
>>
>
>
Re: mahout 0.5 to 0.7 commandline parameter of lda
Posted by Jake Mannix <ja...@gmail.com>.
For Mahout 0.7, the format of the model files for LDA are just a
SequenceFile<IntWritable, VectorWritable>, with the row numbers being the
topicIds, and the entries being the (un-normalized) probabilities for each
termId.
bin/vectordump --dictionary <path to dictionary file> \
--dictioanryType <either text or sequencefile> \
--input <path to model files> \
--vectorSize <num entries per topic you want to
see> \
--sortVectors
On Wed, Oct 17, 2012 at 10:11 PM, vineeth <vi...@gmail.com> wrote:
> Hello,
>
> I am seeing from this website http://theglassicon.com/**
> computing/machine-learning/**running-lda-algorithm-mahout<http://theglassicon.com/computing/machine-learning/running-lda-algorithm-mahout>(Mahout 0.5). This website give the complete procedure to get probabilities
> of word and topics using LDA. However, these steps donot work on Mahout
> 0.7. Can some one give an updated website of the same steps?, or can some
> one provide me the alternative commands and parameters?
>
> Thank You
> Vineeth
>
--
-jake