You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Tommy Chheng <to...@gmail.com> on 2010/05/31 18:28:30 UTC

mahout quickstart-kmeans script sequencefile parameter

  Hi,
I'm using the quickstart-kmeans.sh script from 
https://issues.apache.org/jira/browse/MAHOUT-390 to run the example 
kmeans. I'm on mahout trunk.

It fails on the SequenceFile generation step:
$./bin/mahout seqdirectory -i ./work/reuters-out/ -o 
./work/reuters-out-seqdir -c UTF-8
no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
Exception in thread "main" org.apache.commons.cli2.OptionException: 
Unexpected -i while processing Options
         at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
         at 
org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
         at 
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
         at 
org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)

Alternatively, I tried ./bin/mahout seqdirectory --input 
./work/reuters-out/ -o ./work/reuters-out-seqdir -c UTF-8 but the get 
the same unexpected --input error.


-- 

@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com


Re: mahout quickstart-kmeans script sequencefile parameter

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Good, and thank-you for posting your findings. I've updated the wiki to 
reflect the revised arguments for k-Means and will update the other 
clustering pages shortly.

Jeff


On 6/3/10 5:15 PM, Tommy Chheng wrote:
>  Yes, it had the help. I was just making a comment in case anyone else 
> ran into the error.
>
> @tommychheng
> Programmer and UC Irvine Graduate Student
> Find a great grad school based on research interests: 
> http://gradschoolnow.com
>
>
> On 6/3/10 4:55 PM, Jeff Eastman wrote:
>> Yes, the options have changed a bit recently and that script 
>> evidently did not get updated yet. We are working to make all the 
>> algorithm command lines more uniform and still have a ways to go to 
>> accomplish that goal.
>>
>> - w should now be -ow and causes the output directory to be overwritten
>> - x (--maxIter) is also required though perhaps it should not be? Do 
>> you really want kmeans to run forever?
>>
>> If you run the driver with incorrect arguments, does it not print out 
>> the help information for you?
>> Jeff
>>
>>
>> On 6/3/10 2:58 PM, Tommy Chheng wrote:
>>>  Thanks Drew,
>>> I started a new EC2 instance with the mahout trunk and got it 
>>> working. There is a problem with the last line though.
>>>
>>> The last line in the script gave an error:
>>> ../bin/mahout kmeans -i 
>>> ./work/reuters-out-seqdir-sparse/tfidf/vectors/ -c ./work/clusters 
>>> -o ./work/reuters-kmeans -k 20 -w
>>>
>>> org.apache.commons.cli2.OptionException: Unexpected -w while 
>>> processing Options
>>>
>>> Removing the -w and adding the -maxIter fixes it.
>>> ../bin/mahout kmeans -i 
>>> ./work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./work/clusters 
>>> -o ./work/reuters-kmeans -k 20 --maxIter 20
>>>
>>> I added a comment to
>>> https://issues.apache.org/jira/browse/MAHOUT-390
>>>
>>> @tommychheng
>>> Programmer and UC Irvine Graduate Student
>>> Find a great grad school based on research interests: 
>>> http://gradschoolnow.com
>>>
>>>
>>> On 6/2/10 8:27 PM, Drew Farris wrote:
>>>> Very strange:
>>>>
>>>> drew@skirnir:~/mahout/svn-trunk$ svn info
>>>> Path: .
>>>> URL: https://svn.apache.org/repos/asf/mahout/trunk
>>>> Repository Root: https://svn.apache.org/repos/asf
>>>> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>>>> Revision: 950859
>>>> [...]
>>>> drew@skirnir:~/mahout/svn-trunk$ ./bin/mahout seqdirectory -i
>>>> ./work/reuters-out -o ./work/reuters-out-seqdir -c UTF-8
>>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>>> [..]
>>>> drew@skirnir:~/mahout/svn-trunk$ ls ./work/reuters-out-seqdir
>>>> chunk-0
>>>>
>>>> To be absolutely certain nothing old is lurking in your target 
>>>> directories,
>>>> try 'mvn clean install' to rebuild and see if your results differ. 
>>>> If you
>>>> prefer, you can skip test execution 'mvn clean install 
>>>> -DskipTests=true'
>>>>
>>>> IF that doesn't work, run 'mvn -v' and post the results -- that might
>>>> provide some clues.
>>>>
>>>> - Drew
>>>>
>>>> On Tue, Jun 1, 2010 at 9:39 PM, Tommy 
>>>> Chheng<to...@gmail.com>  wrote:
>>>>
>>>>>   I updated the svn and did a mvn install but still getting a parsing
>>>>> command line error on the seqdirectory command.
>>>>> $svn info
>>>>> Path: .
>>>>> URL: http://svn.apache.org/repos/asf/mahout/trunk
>>>>> Repository Root: http://svn.apache.org/repos/asf
>>>>> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>>>>> Revision: 950329
>>>>> Node Kind: directory
>>>>> Schedule: normal
>>>>> Last Changed Author: srowen
>>>>> Last Changed Rev: 950049
>>>>> Last Changed Date: 2010-06-01 05:55:49 -0700 (Tue, 01 Jun 2010)
>>>>>
>>>>>
>>>>> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o
>>>>> ./work/reuters-out-seqdir -c UTF-8
>>>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>>>> Exception in thread "main" org.apache.commons.cli2.OptionException:
>>>>> Unexpected -i while processing Options
>>>>>         at 
>>>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>>>         at
>>>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205) 
>>>>>
>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
>>>>> Method)
>>>>>         at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>>>>>
>>>>>         at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>>>>>
>>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>         at
>>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) 
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>>         at 
>>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>>>>
>>>>> @tommychheng
>>>>> Programmer and UC Irvine Graduate Student
>>>>> Find a great grad school based on research interests:
>>>>> http://gradschoolnow.com
>>>>>
>>>>> On 6/1/10 12:43 PM, Grant Ingersoll wrote:
>>>>>
>>>>>> Can you try doing an SVN update and then "mvn install" and then 
>>>>>> run again?
>>>>>>
>>>>>> On May 31, 2010, at 12:28 PM, Tommy Chheng wrote:
>>>>>>
>>>>>>   Hi,
>>>>>>> I'm using the quickstart-kmeans.sh script from
>>>>>>> https://issues.apache.org/jira/browse/MAHOUT-390 to run the example
>>>>>>> kmeans. I'm on mahout trunk.
>>>>>>>
>>>>>>> It fails on the SequenceFile generation step:
>>>>>>> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o
>>>>>>> ./work/reuters-out-seqdir -c UTF-8
>>>>>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>>>>>> Exception in thread "main" org.apache.commons.cli2.OptionException:
>>>>>>> Unexpected -i while processing Options
>>>>>>>         at
>>>>>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>>>>>         at
>>>>>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205) 
>>>>>>>
>>>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
>>>>>>> Method)
>>>>>>>         at
>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>>>>>>>
>>>>>>>         at
>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>>>>>>>
>>>>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>         at
>>>>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) 
>>>>>>>
>>>>>>>         at
>>>>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>>>>         at
>>>>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>>>>>>
>>>>>>> Alternatively, I tried ./bin/mahout seqdirectory --input
>>>>>>> ./work/reuters-out/ -o ./work/reuters-out-seqdir -c UTF-8 but 
>>>>>>> the get the
>>>>>>> same unexpected --input error.
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>>
>>>>>>> @tommychheng
>>>>>>> Programmer and UC Irvine Graduate Student
>>>>>>> Find a great grad school based on research interests:
>>>>>>> http://gradschoolnow.com
>>>>>>>
>>>>>>>
>>>
>>
>


Re: mahout quickstart-kmeans script sequencefile parameter

Posted by Tommy Chheng <to...@gmail.com>.
  Yes, it had the help. I was just making a comment in case anyone else 
ran into the error.

@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com


On 6/3/10 4:55 PM, Jeff Eastman wrote:
> Yes, the options have changed a bit recently and that script evidently 
> did not get updated yet. We are working to make all the algorithm 
> command lines more uniform and still have a ways to go to accomplish 
> that goal.
>
> - w should now be -ow and causes the output directory to be overwritten
> - x (--maxIter) is also required though perhaps it should not be? Do 
> you really want kmeans to run forever?
>
> If you run the driver with incorrect arguments, does it not print out 
> the help information for you?
> Jeff
>
>
> On 6/3/10 2:58 PM, Tommy Chheng wrote:
>>  Thanks Drew,
>> I started a new EC2 instance with the mahout trunk and got it 
>> working. There is a problem with the last line though.
>>
>> The last line in the script gave an error:
>> ../bin/mahout kmeans -i 
>> ./work/reuters-out-seqdir-sparse/tfidf/vectors/ -c ./work/clusters -o 
>> ./work/reuters-kmeans -k 20 -w
>>
>> org.apache.commons.cli2.OptionException: Unexpected -w while 
>> processing Options
>>
>> Removing the -w and adding the -maxIter fixes it.
>> ../bin/mahout kmeans -i 
>> ./work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./work/clusters -o 
>> ./work/reuters-kmeans -k 20 --maxIter 20
>>
>> I added a comment to
>> https://issues.apache.org/jira/browse/MAHOUT-390
>>
>> @tommychheng
>> Programmer and UC Irvine Graduate Student
>> Find a great grad school based on research interests: 
>> http://gradschoolnow.com
>>
>>
>> On 6/2/10 8:27 PM, Drew Farris wrote:
>>> Very strange:
>>>
>>> drew@skirnir:~/mahout/svn-trunk$ svn info
>>> Path: .
>>> URL: https://svn.apache.org/repos/asf/mahout/trunk
>>> Repository Root: https://svn.apache.org/repos/asf
>>> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>>> Revision: 950859
>>> [...]
>>> drew@skirnir:~/mahout/svn-trunk$ ./bin/mahout seqdirectory -i
>>> ./work/reuters-out -o ./work/reuters-out-seqdir -c UTF-8
>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>> [..]
>>> drew@skirnir:~/mahout/svn-trunk$ ls ./work/reuters-out-seqdir
>>> chunk-0
>>>
>>> To be absolutely certain nothing old is lurking in your target 
>>> directories,
>>> try 'mvn clean install' to rebuild and see if your results differ. 
>>> If you
>>> prefer, you can skip test execution 'mvn clean install 
>>> -DskipTests=true'
>>>
>>> IF that doesn't work, run 'mvn -v' and post the results -- that might
>>> provide some clues.
>>>
>>> - Drew
>>>
>>> On Tue, Jun 1, 2010 at 9:39 PM, Tommy 
>>> Chheng<to...@gmail.com>  wrote:
>>>
>>>>   I updated the svn and did a mvn install but still getting a parsing
>>>> command line error on the seqdirectory command.
>>>> $svn info
>>>> Path: .
>>>> URL: http://svn.apache.org/repos/asf/mahout/trunk
>>>> Repository Root: http://svn.apache.org/repos/asf
>>>> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>>>> Revision: 950329
>>>> Node Kind: directory
>>>> Schedule: normal
>>>> Last Changed Author: srowen
>>>> Last Changed Rev: 950049
>>>> Last Changed Date: 2010-06-01 05:55:49 -0700 (Tue, 01 Jun 2010)
>>>>
>>>>
>>>> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o
>>>> ./work/reuters-out-seqdir -c UTF-8
>>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>>> Exception in thread "main" org.apache.commons.cli2.OptionException:
>>>> Unexpected -i while processing Options
>>>>         at 
>>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>>         at
>>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205) 
>>>>
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>>>>
>>>>         at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>>>>
>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>         at
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) 
>>>>
>>>>         at
>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>         at 
>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>>>
>>>> @tommychheng
>>>> Programmer and UC Irvine Graduate Student
>>>> Find a great grad school based on research interests:
>>>> http://gradschoolnow.com
>>>>
>>>> On 6/1/10 12:43 PM, Grant Ingersoll wrote:
>>>>
>>>>> Can you try doing an SVN update and then "mvn install" and then 
>>>>> run again?
>>>>>
>>>>> On May 31, 2010, at 12:28 PM, Tommy Chheng wrote:
>>>>>
>>>>>   Hi,
>>>>>> I'm using the quickstart-kmeans.sh script from
>>>>>> https://issues.apache.org/jira/browse/MAHOUT-390 to run the example
>>>>>> kmeans. I'm on mahout trunk.
>>>>>>
>>>>>> It fails on the SequenceFile generation step:
>>>>>> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o
>>>>>> ./work/reuters-out-seqdir -c UTF-8
>>>>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>>>>> Exception in thread "main" org.apache.commons.cli2.OptionException:
>>>>>> Unexpected -i while processing Options
>>>>>>         at
>>>>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>>>>         at
>>>>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205) 
>>>>>>
>>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
>>>>>> Method)
>>>>>>         at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>>>>>>
>>>>>>         at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>>>>>>
>>>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>         at
>>>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) 
>>>>>>
>>>>>>         at
>>>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>>>         at
>>>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>>>>>
>>>>>> Alternatively, I tried ./bin/mahout seqdirectory --input
>>>>>> ./work/reuters-out/ -o ./work/reuters-out-seqdir -c UTF-8 but the 
>>>>>> get the
>>>>>> same unexpected --input error.
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>>
>>>>>> @tommychheng
>>>>>> Programmer and UC Irvine Graduate Student
>>>>>> Find a great grad school based on research interests:
>>>>>> http://gradschoolnow.com
>>>>>>
>>>>>>
>>
>

Re: mahout quickstart-kmeans script sequencefile parameter

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Yes, the options have changed a bit recently and that script evidently 
did not get updated yet. We are working to make all the algorithm 
command lines more uniform and still have a ways to go to accomplish 
that goal.

- w should now be -ow and causes the output directory to be overwritten
- x (--maxIter) is also required though perhaps it should not be? Do you 
really want kmeans to run forever?

If you run the driver with incorrect arguments, does it not print out 
the help information for you?
Jeff


On 6/3/10 2:58 PM, Tommy Chheng wrote:
>  Thanks Drew,
> I started a new EC2 instance with the mahout trunk and got it working. 
> There is a problem with the last line though.
>
> The last line in the script gave an error:
> ../bin/mahout kmeans -i 
> ./work/reuters-out-seqdir-sparse/tfidf/vectors/ -c ./work/clusters -o 
> ./work/reuters-kmeans -k 20 -w
>
> org.apache.commons.cli2.OptionException: Unexpected -w while 
> processing Options
>
> Removing the -w and adding the -maxIter fixes it.
> ../bin/mahout kmeans -i 
> ./work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./work/clusters -o 
> ./work/reuters-kmeans -k 20 --maxIter 20
>
> I added a comment to
> https://issues.apache.org/jira/browse/MAHOUT-390
>
> @tommychheng
> Programmer and UC Irvine Graduate Student
> Find a great grad school based on research interests: 
> http://gradschoolnow.com
>
>
> On 6/2/10 8:27 PM, Drew Farris wrote:
>> Very strange:
>>
>> drew@skirnir:~/mahout/svn-trunk$ svn info
>> Path: .
>> URL: https://svn.apache.org/repos/asf/mahout/trunk
>> Repository Root: https://svn.apache.org/repos/asf
>> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>> Revision: 950859
>> [...]
>> drew@skirnir:~/mahout/svn-trunk$ ./bin/mahout seqdirectory -i
>> ./work/reuters-out -o ./work/reuters-out-seqdir -c UTF-8
>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>> [..]
>> drew@skirnir:~/mahout/svn-trunk$ ls ./work/reuters-out-seqdir
>> chunk-0
>>
>> To be absolutely certain nothing old is lurking in your target 
>> directories,
>> try 'mvn clean install' to rebuild and see if your results differ. If 
>> you
>> prefer, you can skip test execution 'mvn clean install -DskipTests=true'
>>
>> IF that doesn't work, run 'mvn -v' and post the results -- that might
>> provide some clues.
>>
>> - Drew
>>
>> On Tue, Jun 1, 2010 at 9:39 PM, Tommy Chheng<to...@gmail.com>  
>> wrote:
>>
>>>   I updated the svn and did a mvn install but still getting a parsing
>>> command line error on the seqdirectory command.
>>> $svn info
>>> Path: .
>>> URL: http://svn.apache.org/repos/asf/mahout/trunk
>>> Repository Root: http://svn.apache.org/repos/asf
>>> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>>> Revision: 950329
>>> Node Kind: directory
>>> Schedule: normal
>>> Last Changed Author: srowen
>>> Last Changed Rev: 950049
>>> Last Changed Date: 2010-06-01 05:55:49 -0700 (Tue, 01 Jun 2010)
>>>
>>>
>>> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o
>>> ./work/reuters-out-seqdir -c UTF-8
>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>> Exception in thread "main" org.apache.commons.cli2.OptionException:
>>> Unexpected -i while processing Options
>>>         at 
>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>         at
>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205) 
>>>
>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>         at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>>>
>>>         at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>>>
>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>         at
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) 
>>>
>>>         at
>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>         at 
>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>>
>>> @tommychheng
>>> Programmer and UC Irvine Graduate Student
>>> Find a great grad school based on research interests:
>>> http://gradschoolnow.com
>>>
>>> On 6/1/10 12:43 PM, Grant Ingersoll wrote:
>>>
>>>> Can you try doing an SVN update and then "mvn install" and then run 
>>>> again?
>>>>
>>>> On May 31, 2010, at 12:28 PM, Tommy Chheng wrote:
>>>>
>>>>   Hi,
>>>>> I'm using the quickstart-kmeans.sh script from
>>>>> https://issues.apache.org/jira/browse/MAHOUT-390 to run the example
>>>>> kmeans. I'm on mahout trunk.
>>>>>
>>>>> It fails on the SequenceFile generation step:
>>>>> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o
>>>>> ./work/reuters-out-seqdir -c UTF-8
>>>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>>>> Exception in thread "main" org.apache.commons.cli2.OptionException:
>>>>> Unexpected -i while processing Options
>>>>>         at
>>>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>>>         at
>>>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205) 
>>>>>
>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
>>>>> Method)
>>>>>         at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>>>>>
>>>>>         at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>>>>>
>>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>         at
>>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) 
>>>>>
>>>>>         at
>>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>>         at
>>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>>>>
>>>>> Alternatively, I tried ./bin/mahout seqdirectory --input
>>>>> ./work/reuters-out/ -o ./work/reuters-out-seqdir -c UTF-8 but the 
>>>>> get the
>>>>> same unexpected --input error.
>>>>>
>>>>>
>>>>> -- 
>>>>>
>>>>> @tommychheng
>>>>> Programmer and UC Irvine Graduate Student
>>>>> Find a great grad school based on research interests:
>>>>> http://gradschoolnow.com
>>>>>
>>>>>
>


Re: mahout quickstart-kmeans script sequencefile parameter

Posted by Tommy Chheng <to...@gmail.com>.
  Thanks Drew,
I started a new EC2 instance with the mahout trunk and got it working. 
There is a problem with the last line though.

The last line in the script gave an error:
../bin/mahout kmeans -i ./work/reuters-out-seqdir-sparse/tfidf/vectors/ 
-c ./work/clusters -o ./work/reuters-kmeans -k 20 -w

org.apache.commons.cli2.OptionException: Unexpected -w while processing 
Options

Removing the -w and adding the -maxIter fixes it.
../bin/mahout kmeans -i ./work/reuters-out-seqdir-sparse/tfidf-vectors/ 
-c ./work/clusters -o ./work/reuters-kmeans -k 20 --maxIter 20

I added a comment to
https://issues.apache.org/jira/browse/MAHOUT-390

@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com


On 6/2/10 8:27 PM, Drew Farris wrote:
> Very strange:
>
> drew@skirnir:~/mahout/svn-trunk$ svn info
> Path: .
> URL: https://svn.apache.org/repos/asf/mahout/trunk
> Repository Root: https://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 950859
> [...]
> drew@skirnir:~/mahout/svn-trunk$ ./bin/mahout seqdirectory -i
> ./work/reuters-out -o ./work/reuters-out-seqdir -c UTF-8
> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
> [..]
> drew@skirnir:~/mahout/svn-trunk$ ls ./work/reuters-out-seqdir
> chunk-0
>
> To be absolutely certain nothing old is lurking in your target directories,
> try 'mvn clean install' to rebuild and see if your results differ. If you
> prefer, you can skip test execution 'mvn clean install -DskipTests=true'
>
> IF that doesn't work, run 'mvn -v' and post the results -- that might
> provide some clues.
>
> - Drew
>
> On Tue, Jun 1, 2010 at 9:39 PM, Tommy Chheng<to...@gmail.com>  wrote:
>
>>   I updated the svn and did a mvn install but still getting a parsing
>> command line error on the seqdirectory command.
>> $svn info
>> Path: .
>> URL: http://svn.apache.org/repos/asf/mahout/trunk
>> Repository Root: http://svn.apache.org/repos/asf
>> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
>> Revision: 950329
>> Node Kind: directory
>> Schedule: normal
>> Last Changed Author: srowen
>> Last Changed Rev: 950049
>> Last Changed Date: 2010-06-01 05:55:49 -0700 (Tue, 01 Jun 2010)
>>
>>
>> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o
>> ./work/reuters-out-seqdir -c UTF-8
>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>> Exception in thread "main" org.apache.commons.cli2.OptionException:
>> Unexpected -i while processing Options
>>         at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>         at
>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>         at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>
>> @tommychheng
>> Programmer and UC Irvine Graduate Student
>> Find a great grad school based on research interests:
>> http://gradschoolnow.com
>>
>> On 6/1/10 12:43 PM, Grant Ingersoll wrote:
>>
>>> Can you try doing an SVN update and then "mvn install" and then run again?
>>>
>>> On May 31, 2010, at 12:28 PM, Tommy Chheng wrote:
>>>
>>>   Hi,
>>>> I'm using the quickstart-kmeans.sh script from
>>>> https://issues.apache.org/jira/browse/MAHOUT-390 to run the example
>>>> kmeans. I'm on mahout trunk.
>>>>
>>>> It fails on the SequenceFile generation step:
>>>> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o
>>>> ./work/reuters-out-seqdir -c UTF-8
>>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>>> Exception in thread "main" org.apache.commons.cli2.OptionException:
>>>> Unexpected -i while processing Options
>>>>         at
>>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>>         at
>>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>         at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>         at
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>>         at
>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>         at
>>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>>>
>>>> Alternatively, I tried ./bin/mahout seqdirectory --input
>>>> ./work/reuters-out/ -o ./work/reuters-out-seqdir -c UTF-8 but the get the
>>>> same unexpected --input error.
>>>>
>>>>
>>>> --
>>>>
>>>> @tommychheng
>>>> Programmer and UC Irvine Graduate Student
>>>> Find a great grad school based on research interests:
>>>> http://gradschoolnow.com
>>>>
>>>>

Re: mahout quickstart-kmeans script sequencefile parameter

Posted by Drew Farris <dr...@gmail.com>.
Very strange:

drew@skirnir:~/mahout/svn-trunk$ svn info
Path: .
URL: https://svn.apache.org/repos/asf/mahout/trunk
Repository Root: https://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 950859
[...]
drew@skirnir:~/mahout/svn-trunk$ ./bin/mahout seqdirectory -i
./work/reuters-out -o ./work/reuters-out-seqdir -c UTF-8
no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
[..]
drew@skirnir:~/mahout/svn-trunk$ ls ./work/reuters-out-seqdir
chunk-0

To be absolutely certain nothing old is lurking in your target directories,
try 'mvn clean install' to rebuild and see if your results differ. If you
prefer, you can skip test execution 'mvn clean install -DskipTests=true'

IF that doesn't work, run 'mvn -v' and post the results -- that might
provide some clues.

- Drew

On Tue, Jun 1, 2010 at 9:39 PM, Tommy Chheng <to...@gmail.com> wrote:

>  I updated the svn and did a mvn install but still getting a parsing
> command line error on the seqdirectory command.
> $svn info
> Path: .
> URL: http://svn.apache.org/repos/asf/mahout/trunk
> Repository Root: http://svn.apache.org/repos/asf
> Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
> Revision: 950329
> Node Kind: directory
> Schedule: normal
> Last Changed Author: srowen
> Last Changed Rev: 950049
> Last Changed Date: 2010-06-01 05:55:49 -0700 (Tue, 01 Jun 2010)
>
>
> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o
> ./work/reuters-out-seqdir -c UTF-8
> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
> Exception in thread "main" org.apache.commons.cli2.OptionException:
> Unexpected -i while processing Options
>        at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>        at
> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>        at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>
> @tommychheng
> Programmer and UC Irvine Graduate Student
> Find a great grad school based on research interests:
> http://gradschoolnow.com
>
> On 6/1/10 12:43 PM, Grant Ingersoll wrote:
>
>> Can you try doing an SVN update and then "mvn install" and then run again?
>>
>> On May 31, 2010, at 12:28 PM, Tommy Chheng wrote:
>>
>>  Hi,
>>> I'm using the quickstart-kmeans.sh script from
>>> https://issues.apache.org/jira/browse/MAHOUT-390 to run the example
>>> kmeans. I'm on mahout trunk.
>>>
>>> It fails on the SequenceFile generation step:
>>> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o
>>> ./work/reuters-out-seqdir -c UTF-8
>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>> Exception in thread "main" org.apache.commons.cli2.OptionException:
>>> Unexpected -i while processing Options
>>>        at
>>> org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>>        at
>>> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205)
>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>        at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>        at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>>        at
>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>        at
>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>        at
>>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>>
>>> Alternatively, I tried ./bin/mahout seqdirectory --input
>>> ./work/reuters-out/ -o ./work/reuters-out-seqdir -c UTF-8 but the get the
>>> same unexpected --input error.
>>>
>>>
>>> --
>>>
>>> @tommychheng
>>> Programmer and UC Irvine Graduate Student
>>> Find a great grad school based on research interests:
>>> http://gradschoolnow.com
>>>
>>>
>>

Re: mahout quickstart-kmeans script sequencefile parameter

Posted by Tommy Chheng <to...@gmail.com>.
  I updated the svn and did a mvn install but still getting a parsing 
command line error on the seqdirectory command.
$svn info
Path: .
URL: http://svn.apache.org/repos/asf/mahout/trunk
Repository Root: http://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 950329
Node Kind: directory
Schedule: normal
Last Changed Author: srowen
Last Changed Rev: 950049
Last Changed Date: 2010-06-01 05:55:49 -0700 (Tue, 01 Jun 2010)

$./bin/mahout seqdirectory -i ./work/reuters-out/ -o 
./work/reuters-out-seqdir -c UTF-8
no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
Exception in thread "main" org.apache.commons.cli2.OptionException: 
Unexpected -i while processing Options
         at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
         at 
org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
         at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
         at java.lang.reflect.Method.invoke(Method.java:597)
         at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
         at 
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
         at 
org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)

@tommychheng
Programmer and UC Irvine Graduate Student
Find a great grad school based on research interests: http://gradschoolnow.com

On 6/1/10 12:43 PM, Grant Ingersoll wrote:
> Can you try doing an SVN update and then "mvn install" and then run again?
>
> On May 31, 2010, at 12:28 PM, Tommy Chheng wrote:
>
>> Hi,
>> I'm using the quickstart-kmeans.sh script from https://issues.apache.org/jira/browse/MAHOUT-390 to run the example kmeans. I'm on mahout trunk.
>>
>> It fails on the SequenceFile generation step:
>> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o ./work/reuters-out-seqdir -c UTF-8
>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>> Exception in thread "main" org.apache.commons.cli2.OptionException: Unexpected -i while processing Options
>>         at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>>         at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>         at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>         at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
>>
>> Alternatively, I tried ./bin/mahout seqdirectory --input ./work/reuters-out/ -o ./work/reuters-out-seqdir -c UTF-8 but the get the same unexpected --input error.
>>
>>
>> -- 
>>
>> @tommychheng
>> Programmer and UC Irvine Graduate Student
>> Find a great grad school based on research interests: http://gradschoolnow.com
>>
>

Re: mahout quickstart-kmeans script sequencefile parameter

Posted by Grant Ingersoll <gs...@apache.org>.
Can you try doing an SVN update and then "mvn install" and then run again?

On May 31, 2010, at 12:28 PM, Tommy Chheng wrote:

> Hi,
> I'm using the quickstart-kmeans.sh script from https://issues.apache.org/jira/browse/MAHOUT-390 to run the example kmeans. I'm on mahout trunk.
> 
> It fails on the SequenceFile generation step:
> $./bin/mahout seqdirectory -i ./work/reuters-out/ -o ./work/reuters-out-seqdir -c UTF-8
> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
> Exception in thread "main" org.apache.commons.cli2.OptionException: Unexpected -i while processing Options
>        at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99)
>        at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:205)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174)
> 
> Alternatively, I tried ./bin/mahout seqdirectory --input ./work/reuters-out/ -o ./work/reuters-out-seqdir -c UTF-8 but the get the same unexpected --input error.
> 
> 
> -- 
> 
> @tommychheng
> Programmer and UC Irvine Graduate Student
> Find a great grad school based on research interests: http://gradschoolnow.com
>