You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Jeff Eastman <je...@Narus.com> on 2011/05/20 02:04:57 UTC

Is LDA Broken?

I'm running build-reuters option 2 and the LDA runs to maxIterations (20) without ever producing a non-zero Log Likelihood. This is not the behavior that I recall from earlier runs and seems quite unlikely to be correct.


Re: Is LDA Broken?

Posted by Grant Ingersoll <gs...@apache.org>.
We should setup a Jenkins job to run the examples on a regular basis and to validate the output.  I've been doing some Jenkins work lately, I will see if I can get to it after Revolution.

-Grant

On May 20, 2011, at 12:43 PM, Jake Mannix wrote:

> Looks like Grant got a fix posted?  Has anyone else tried it?
> 
>  -jake
> 
> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
> 
>> I think we definitely need to figure out whether it's a bug or some other
>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of thing
>> that needs a fix ASAP in which case write up all you know and everyone will
>> pile in to look at it.
>> 
>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>> 
>>> Is this an issue that should be fixed before we release? It seems to be
>>> broken to me.
>>> 
>>> -----Original Message-----
>>> From: Jeff Eastman [mailto:jeastman@narus.com]
>>> Sent: Thursday, May 19, 2011 5:05 PM
>>> To: dev@mahout.apache.org
>>> Subject: Is LDA Broken?
>>> 
>>> I'm running build-reuters option 2 and the LDA runs to maxIterations (20)
>>> without ever producing a non-zero Log Likelihood. This is not the
>> behavior
>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>> 
>>> 
>> 



Re: Is LDA Broken?

Posted by Grant Ingersoll <gs...@apache.org>.
I've subsequently restored it but you can run it from examples/bin too.

-Grant

On May 20, 2011, at 1:58 PM, Jeff Eastman wrote:

> It does seem these two symptoms are of the same problem. I applied the patch; however, and now neither option runs. It appears the cd is off but I can't see where.
> 
> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh 
> Please select a number to choose the corresponding clustering algorithm
> 1. kmeans clustering
> 2. lda clustering
> Enter your choice : 1
> ok. You chose 1 and we'll use kmeans Clustering
> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or directory
> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or directory
> 
> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org] 
> Sent: Friday, May 20, 2011 10:50 AM
> To: dev@mahout.apache.org
> Subject: Re: Is LDA Broken?
> 
> Likely so, see MAHOUT-694.  
> 
> 
> On May 20, 2011, at 1:39 PM, Sean Owen wrote:
> 
>> Oh sorry these are the same issue? Great!
>> On May 20, 2011 5:44 PM, "Jake Mannix" <ja...@gmail.com> wrote:
>>> Looks like Grant got a fix posted? Has anyone else tried it?
>>> 
>>> -jake
>>> 
>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
>>> 
>>>> I think we definitely need to figure out whether it's a bug or some other
>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
>> thing
>>>> that needs a fix ASAP in which case write up all you know and everyone
>> will
>>>> pile in to look at it.
>>>> 
>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>>>> 
>>>>> Is this an issue that should be fixed before we release? It seems to be
>>>>> broken to me.
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Jeff Eastman [mailto:jeastman@narus.com]
>>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>>> To: dev@mahout.apache.org
>>>>> Subject: Is LDA Broken?
>>>>> 
>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
>> (20)
>>>>> without ever producing a non-zero Log Likelihood. This is not the
>>>> behavior
>>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>>> 
>>>>> 
>>>> 
> 
> 

--------------------------------------------
Grant Ingersoll
Join the LUCENE REVOLUTION
Lucene & Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org


RE: Is LDA Broken?

Posted by Jeff Eastman <je...@Narus.com>.
Grant,

Your most recent patch fixes the local reuters problem I reported earlier. Thanks. Kmeans still has the index exception on the cluster, but I gather that probably won't ever work?

-----Original Message-----
From: Jeff Eastman [mailto:jeastman@narus.com] 
Sent: Friday, May 20, 2011 3:05 PM
To: dev@mahout.apache.org
Subject: RE: Is LDA Broken?

I will test that in a couple of hours. In meetings till then.

-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org] 
Sent: Friday, May 20, 2011 1:45 PM
To: dev@mahout.apache.org
Subject: Re: Is LDA Broken?

Jeff, I just put up a patch on M-694 that closes the input stream.  Does that fix it for you?  I can't repro. it here (which is weird, b/c my ulimit reports max # of files as 256)

-Grant

On May 20, 2011, at 4:09 PM, Jeff Eastman wrote:

> Don't know what is happening. I rebooted my Linux VM, did a clean mahout build, zapped bin/work, and got the same result. Will have to debug more later today...
> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org] 
> Sent: Friday, May 20, 2011 12:54 PM
> To: dev@mahout.apache.org
> Subject: Re: Is LDA Broken?
> 
> Hmm, that's weird.  I might suggest doing "clean install" first as well as deleting your examples/bin/work directory.  
> 
> On May 20, 2011, at 3:41 PM, Jeff Eastman wrote:
> 
>> I uncommented line 39 and am getting the same errors (index error with kmeans, 0 LL with LDA) as before. I am running on real clusters (CDH3 & MapR). Trying to run locally, I get this curious output. I don't have much time today to pursue it (in meetings all day) but will do my best:
>> 
>> [dev@devbox mahout]$ ./examples/bin/build-reuters.sh 
>> Please select a number to choose the corresponding clustering algorithm
>> 1. kmeans clustering
>> 2. lda clustering
>> Enter your choice : 1
>> ok. You chose 1 and we'll use kmeans Clustering
>> Downloading Reuters-21578
>> % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>>                                Dload  Upload   Total   Spent    Left  Speed
>> 100 7959k  100 7959k    0     0  1145k      0  0:00:06  0:00:06 --:--:-- 1135k
>> Extracting...
>> no HADOOP_HOME set, running locally
>> May 20, 2011 12:35:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>> WARNING: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
>> Deleting all files in ./examples/bin/work/reuters-out
>> May 20, 2011 12:35:56 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 4690 ms
>> no HADOOP_HOME set, running locally
>> May 20, 2011 12:35:57 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647, --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=./examples/bin/work/reuters-out/, --keyPrefix=, --output=./examples/bin/work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
>> Exception in thread "main" java.lang.IllegalStateException: java.io.FileNotFoundException: /home/dev/workspace/mahout/examples/bin/work/reuters-out/reut2-018.sgm-835.txt (Too many open files)
>>       at org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:79)
>>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:724)
>>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
>>       at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:76)
>>       at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>       at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>       at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
>> C
>> 
>> -----Original Message-----
>> From: Grant Ingersoll [mailto:gsingers@apache.org] 
>> Sent: Friday, May 20, 2011 11:17 AM
>> To: dev@mahout.apache.org
>> Subject: Re: Is LDA Broken?
>> 
>> yeah, sorry.  I commented out line 39: cd examples/bin
>> 
>> On May 20, 2011, at 1:58 PM, Jeff Eastman wrote:
>> 
>>> It does seem these two symptoms are of the same problem. I applied the patch; however, and now neither option runs. It appears the cd is off but I can't see where.
>>> 
>>> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh 
>>> Please select a number to choose the corresponding clustering algorithm
>>> 1. kmeans clustering
>>> 2. lda clustering
>>> Enter your choice : 1
>>> ok. You chose 1 and we'll use kmeans Clustering
>>> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or directory
>>> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or directory
>>> 
>>> 
>>> -----Original Message-----
>>> From: Grant Ingersoll [mailto:gsingers@apache.org] 
>>> Sent: Friday, May 20, 2011 10:50 AM
>>> To: dev@mahout.apache.org
>>> Subject: Re: Is LDA Broken?
>>> 
>>> Likely so, see MAHOUT-694.  
>>> 
>>> 
>>> On May 20, 2011, at 1:39 PM, Sean Owen wrote:
>>> 
>>>> Oh sorry these are the same issue? Great!
>>>> On May 20, 2011 5:44 PM, "Jake Mannix" <ja...@gmail.com> wrote:
>>>>> Looks like Grant got a fix posted? Has anyone else tried it?
>>>>> 
>>>>> -jake
>>>>> 
>>>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
>>>>> 
>>>>>> I think we definitely need to figure out whether it's a bug or some other
>>>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
>>>> thing
>>>>>> that needs a fix ASAP in which case write up all you know and everyone
>>>> will
>>>>>> pile in to look at it.
>>>>>> 
>>>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>>>>>> 
>>>>>>> Is this an issue that should be fixed before we release? It seems to be
>>>>>>> broken to me.
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Jeff Eastman [mailto:jeastman@narus.com]
>>>>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>>>>> To: dev@mahout.apache.org
>>>>>>> Subject: Is LDA Broken?
>>>>>>> 
>>>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
>>>> (20)
>>>>>>> without ever producing a non-zero Log Likelihood. This is not the
>>>>>> behavior
>>>>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>> 
>>> 
>> 
>> --------------------------------------------
>> Grant Ingersoll
>> Join the LUCENE REVOLUTION
>> Lucene & Solr User Conference
>> May 25-26, San Francisco
>> www.lucenerevolution.org
>> 
> 
> --------------------------------------------
> Grant Ingersoll
> Join the LUCENE REVOLUTION
> Lucene & Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
> 

--------------------------
Grant Ingersoll
Lucene Revolution -- Lucene and Solr User Conference
May 25-26 in San Francisco
www.lucenerevolution.org


RE: Is LDA Broken?

Posted by Jeff Eastman <je...@Narus.com>.
I will test that in a couple of hours. In meetings till then.

-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org] 
Sent: Friday, May 20, 2011 1:45 PM
To: dev@mahout.apache.org
Subject: Re: Is LDA Broken?

Jeff, I just put up a patch on M-694 that closes the input stream.  Does that fix it for you?  I can't repro. it here (which is weird, b/c my ulimit reports max # of files as 256)

-Grant

On May 20, 2011, at 4:09 PM, Jeff Eastman wrote:

> Don't know what is happening. I rebooted my Linux VM, did a clean mahout build, zapped bin/work, and got the same result. Will have to debug more later today...
> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org] 
> Sent: Friday, May 20, 2011 12:54 PM
> To: dev@mahout.apache.org
> Subject: Re: Is LDA Broken?
> 
> Hmm, that's weird.  I might suggest doing "clean install" first as well as deleting your examples/bin/work directory.  
> 
> On May 20, 2011, at 3:41 PM, Jeff Eastman wrote:
> 
>> I uncommented line 39 and am getting the same errors (index error with kmeans, 0 LL with LDA) as before. I am running on real clusters (CDH3 & MapR). Trying to run locally, I get this curious output. I don't have much time today to pursue it (in meetings all day) but will do my best:
>> 
>> [dev@devbox mahout]$ ./examples/bin/build-reuters.sh 
>> Please select a number to choose the corresponding clustering algorithm
>> 1. kmeans clustering
>> 2. lda clustering
>> Enter your choice : 1
>> ok. You chose 1 and we'll use kmeans Clustering
>> Downloading Reuters-21578
>> % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>>                                Dload  Upload   Total   Spent    Left  Speed
>> 100 7959k  100 7959k    0     0  1145k      0  0:00:06  0:00:06 --:--:-- 1135k
>> Extracting...
>> no HADOOP_HOME set, running locally
>> May 20, 2011 12:35:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>> WARNING: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
>> Deleting all files in ./examples/bin/work/reuters-out
>> May 20, 2011 12:35:56 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 4690 ms
>> no HADOOP_HOME set, running locally
>> May 20, 2011 12:35:57 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647, --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=./examples/bin/work/reuters-out/, --keyPrefix=, --output=./examples/bin/work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
>> Exception in thread "main" java.lang.IllegalStateException: java.io.FileNotFoundException: /home/dev/workspace/mahout/examples/bin/work/reuters-out/reut2-018.sgm-835.txt (Too many open files)
>>       at org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:79)
>>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:724)
>>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
>>       at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:76)
>>       at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>       at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>       at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
>> C
>> 
>> -----Original Message-----
>> From: Grant Ingersoll [mailto:gsingers@apache.org] 
>> Sent: Friday, May 20, 2011 11:17 AM
>> To: dev@mahout.apache.org
>> Subject: Re: Is LDA Broken?
>> 
>> yeah, sorry.  I commented out line 39: cd examples/bin
>> 
>> On May 20, 2011, at 1:58 PM, Jeff Eastman wrote:
>> 
>>> It does seem these two symptoms are of the same problem. I applied the patch; however, and now neither option runs. It appears the cd is off but I can't see where.
>>> 
>>> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh 
>>> Please select a number to choose the corresponding clustering algorithm
>>> 1. kmeans clustering
>>> 2. lda clustering
>>> Enter your choice : 1
>>> ok. You chose 1 and we'll use kmeans Clustering
>>> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or directory
>>> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or directory
>>> 
>>> 
>>> -----Original Message-----
>>> From: Grant Ingersoll [mailto:gsingers@apache.org] 
>>> Sent: Friday, May 20, 2011 10:50 AM
>>> To: dev@mahout.apache.org
>>> Subject: Re: Is LDA Broken?
>>> 
>>> Likely so, see MAHOUT-694.  
>>> 
>>> 
>>> On May 20, 2011, at 1:39 PM, Sean Owen wrote:
>>> 
>>>> Oh sorry these are the same issue? Great!
>>>> On May 20, 2011 5:44 PM, "Jake Mannix" <ja...@gmail.com> wrote:
>>>>> Looks like Grant got a fix posted? Has anyone else tried it?
>>>>> 
>>>>> -jake
>>>>> 
>>>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
>>>>> 
>>>>>> I think we definitely need to figure out whether it's a bug or some other
>>>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
>>>> thing
>>>>>> that needs a fix ASAP in which case write up all you know and everyone
>>>> will
>>>>>> pile in to look at it.
>>>>>> 
>>>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>>>>>> 
>>>>>>> Is this an issue that should be fixed before we release? It seems to be
>>>>>>> broken to me.
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Jeff Eastman [mailto:jeastman@narus.com]
>>>>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>>>>> To: dev@mahout.apache.org
>>>>>>> Subject: Is LDA Broken?
>>>>>>> 
>>>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
>>>> (20)
>>>>>>> without ever producing a non-zero Log Likelihood. This is not the
>>>>>> behavior
>>>>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>> 
>>> 
>> 
>> --------------------------------------------
>> Grant Ingersoll
>> Join the LUCENE REVOLUTION
>> Lucene & Solr User Conference
>> May 25-26, San Francisco
>> www.lucenerevolution.org
>> 
> 
> --------------------------------------------
> Grant Ingersoll
> Join the LUCENE REVOLUTION
> Lucene & Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
> 

--------------------------
Grant Ingersoll
Lucene Revolution -- Lucene and Solr User Conference
May 25-26 in San Francisco
www.lucenerevolution.org


Re: Is LDA Broken?

Posted by Ted Dunning <te...@gmail.com>.
This really sounds like you have bits of different hadoop versions floating
around.  This could be in the cluster itself rather than on your dev
machine.  Client and server here problem mean map-reduce program and
datanode or namenode.

It is even conceivably a CDH bug that isn't tickled by other code.  That
seems really unlikely, though, both because I don't understand how we would
tickle a piece of code nobody else has and because the RPC code is pretty
centralized.

On Sat, May 21, 2011 at 10:43 AM, Jeff Eastman
<jd...@windwardsolutions.com>wrote:

> Version Mismatch between client and server... command aborted.
>
> It seems to be thrown by the DFSAdmin class run() method when an
> RPC.VersionMismatch exception is thrown. I verified I have the same CDH3
> client and server versions so I'm baffled.  I did upgrade my client CDH3
> from the February 21 beta to the March 25 production bits, but I still get
> the error. All of the many other jobs I've submitted to that cluster from my
> client VM work without issue. Is 20 newsgroups doing anything unusual?
>
>
> On 5/21/11 3:46 AM, Grant Ingersoll wrote:
>
>> Jeff, what's your other exception?
>>
>>
>

Re: Is LDA Broken?

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Version Mismatch between client and server... command aborted.

It seems to be thrown by the DFSAdmin class run() method when an 
RPC.VersionMismatch exception is thrown. I verified I have the same CDH3 
client and server versions so I'm baffled.  I did upgrade my client CDH3 
from the February 21 beta to the March 25 production bits, but I still 
get the error. All of the many other jobs I've submitted to that cluster 
from my client VM work without issue. Is 20 newsgroups doing anything 
unusual?

On 5/21/11 3:46 AM, Grant Ingersoll wrote:
> Jeff, what's your other exception?
>


Re: Is LDA Broken?

Posted by Grant Ingersoll <gs...@apache.org>.
Sorry, yes, finally{} :-).  Brain not functioning yet.  Time to go eat breakfast.

On May 21, 2011, at 6:49 AM, Sean Owen wrote:

> "finally"? Yes of course, that's what we should be doing.
> I guess I'd be more surprised if we had many finalize() to write.
> 
> On Sat, May 21, 2011 at 11:46 AM, Grant Ingersoll <gs...@apache.org>wrote:
> 
>> I did:
>> 
>> finalize{
>>       IOUtils.closeStream();
>> }
>> 
>> The InputStream in this particular case is actually one we opened.  I'll
>> commit the patch.
>> 
>> Jeff, what's your other exception?
>> 
>> 



Re: Is LDA Broken?

Posted by Sean Owen <sr...@gmail.com>.
"finally"? Yes of course, that's what we should be doing.
I guess I'd be more surprised if we had many finalize() to write.

On Sat, May 21, 2011 at 11:46 AM, Grant Ingersoll <gs...@apache.org>wrote:

> I did:
>
> finalize{
>        IOUtils.closeStream();
> }
>
> The InputStream in this particular case is actually one we opened.  I'll
> commit the patch.
>
> Jeff, what's your other exception?
>
>

Re: Is LDA Broken?

Posted by Grant Ingersoll <gs...@apache.org>.
I did:

finalize{
	IOUtils.closeStream();
}

The InputStream in this particular case is actually one we opened.  I'll commit the patch.

Jeff, what's your other exception?

On May 20, 2011, at 11:56 PM, Sean Owen wrote:

> That's a decent pattern. The streams in question here are implemented in
> third-party libraries (e.g. SequenceFile.Reader) so not sure we can change
> them. Just being vigilant about close() in finally blocks is the first and
> important step.
> 
> On Sat, May 21, 2011 at 4:51 AM, Lance Norskog <go...@gmail.com> wrote:
> 
>> In Solr the practice is:
>> 
>> finalize() {
>>  if (! closed()) {
>>       log(WARN, "Hey! You didn't call close!");
>>       close();
>>    }
>> }
>> 



Re: Is LDA Broken?

Posted by Sean Owen <sr...@gmail.com>.
That's a decent pattern. The streams in question here are implemented in
third-party libraries (e.g. SequenceFile.Reader) so not sure we can change
them. Just being vigilant about close() in finally blocks is the first and
important step.

On Sat, May 21, 2011 at 4:51 AM, Lance Norskog <go...@gmail.com> wrote:

> In Solr the practice is:
>
> finalize() {
>   if (! closed()) {
>        log(WARN, "Hey! You didn't call close!");
>        close();
>     }
> }
>

Re: Is LDA Broken?

Posted by Lance Norskog <go...@gmail.com>.
In Solr the practice is:

finalize() {
   if (! closed()) {
        log(WARN, "Hey! You didn't call close!");
        close();
    }
}

On Fri, May 20, 2011 at 7:09 PM, Ted Dunning <te...@gmail.com> wrote:
> +1
>
> (with exclamation marks)
>
> On Fri, May 20, 2011 at 7:07 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> (Yes there is no guarantee when a reclaimable object is finalize()-ed and
>> removed. But as an aside, I believe we all understand that relying on that
>> behavior is wrong? Letting finalize() close a stream is a bug for sure. If
>> we're suggesting that's the case here, that's bad. All streams ought to be
>> explicitly closed.)
>>
>> On Fri, May 20, 2011 at 11:07 PM, Benson Margulies <bimargulies@gmail.com
>> >wrote:
>>
>> > Welcome to the Java garbage collector. You never know when it will
>> > close a stream.
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Is LDA Broken?

Posted by Ted Dunning <te...@gmail.com>.
+1

(with exclamation marks)

On Fri, May 20, 2011 at 7:07 PM, Sean Owen <sr...@gmail.com> wrote:

> (Yes there is no guarantee when a reclaimable object is finalize()-ed and
> removed. But as an aside, I believe we all understand that relying on that
> behavior is wrong? Letting finalize() close a stream is a bug for sure. If
> we're suggesting that's the case here, that's bad. All streams ought to be
> explicitly closed.)
>
> On Fri, May 20, 2011 at 11:07 PM, Benson Margulies <bimargulies@gmail.com
> >wrote:
>
> > Welcome to the Java garbage collector. You never know when it will
> > close a stream.
>

Re: Is LDA Broken?

Posted by Sean Owen <sr...@gmail.com>.
(Yes there is no guarantee when a reclaimable object is finalize()-ed and
removed. But as an aside, I believe we all understand that relying on that
behavior is wrong? Letting finalize() close a stream is a bug for sure. If
we're suggesting that's the case here, that's bad. All streams ought to be
explicitly closed.)

On Fri, May 20, 2011 at 11:07 PM, Benson Margulies <bi...@gmail.com>wrote:

> Welcome to the Java garbage collector. You never know when it will
> close a stream.

Re: Is LDA Broken?

Posted by Benson Margulies <bi...@gmail.com>.
Welcome to the Java garbage collector. You never know when it will
close a stream.

On Fri, May 20, 2011 at 4:45 PM, Grant Ingersoll <gs...@apache.org> wrote:
> Jeff, I just put up a patch on M-694 that closes the input stream.  Does that fix it for you?  I can't repro. it here (which is weird, b/c my ulimit reports max # of files as 256)
>
> -Grant
>
> On May 20, 2011, at 4:09 PM, Jeff Eastman wrote:
>
>> Don't know what is happening. I rebooted my Linux VM, did a clean mahout build, zapped bin/work, and got the same result. Will have to debug more later today...
>>
>> -----Original Message-----
>> From: Grant Ingersoll [mailto:gsingers@apache.org]
>> Sent: Friday, May 20, 2011 12:54 PM
>> To: dev@mahout.apache.org
>> Subject: Re: Is LDA Broken?
>>
>> Hmm, that's weird.  I might suggest doing "clean install" first as well as deleting your examples/bin/work directory.
>>
>> On May 20, 2011, at 3:41 PM, Jeff Eastman wrote:
>>
>>> I uncommented line 39 and am getting the same errors (index error with kmeans, 0 LL with LDA) as before. I am running on real clusters (CDH3 & MapR). Trying to run locally, I get this curious output. I don't have much time today to pursue it (in meetings all day) but will do my best:
>>>
>>> [dev@devbox mahout]$ ./examples/bin/build-reuters.sh
>>> Please select a number to choose the corresponding clustering algorithm
>>> 1. kmeans clustering
>>> 2. lda clustering
>>> Enter your choice : 1
>>> ok. You chose 1 and we'll use kmeans Clustering
>>> Downloading Reuters-21578
>>> % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>>>                                Dload  Upload   Total   Spent    Left  Speed
>>> 100 7959k  100 7959k    0     0  1145k      0  0:00:06  0:00:06 --:--:-- 1135k
>>> Extracting...
>>> no HADOOP_HOME set, running locally
>>> May 20, 2011 12:35:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>>> WARNING: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
>>> Deleting all files in ./examples/bin/work/reuters-out
>>> May 20, 2011 12:35:56 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Program took 4690 ms
>>> no HADOOP_HOME set, running locally
>>> May 20, 2011 12:35:57 PM org.slf4j.impl.JCLLoggerAdapter info
>>> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647, --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=./examples/bin/work/reuters-out/, --keyPrefix=, --output=./examples/bin/work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
>>> Exception in thread "main" java.lang.IllegalStateException: java.io.FileNotFoundException: /home/dev/workspace/mahout/examples/bin/work/reuters-out/reut2-018.sgm-835.txt (Too many open files)
>>>       at org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:79)
>>>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:724)
>>>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
>>>       at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:76)
>>>       at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
>>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>       at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>       at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
>>> C
>>>
>>> -----Original Message-----
>>> From: Grant Ingersoll [mailto:gsingers@apache.org]
>>> Sent: Friday, May 20, 2011 11:17 AM
>>> To: dev@mahout.apache.org
>>> Subject: Re: Is LDA Broken?
>>>
>>> yeah, sorry.  I commented out line 39: cd examples/bin
>>>
>>> On May 20, 2011, at 1:58 PM, Jeff Eastman wrote:
>>>
>>>> It does seem these two symptoms are of the same problem. I applied the patch; however, and now neither option runs. It appears the cd is off but I can't see where.
>>>>
>>>> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh
>>>> Please select a number to choose the corresponding clustering algorithm
>>>> 1. kmeans clustering
>>>> 2. lda clustering
>>>> Enter your choice : 1
>>>> ok. You chose 1 and we'll use kmeans Clustering
>>>> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or directory
>>>> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or directory
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Grant Ingersoll [mailto:gsingers@apache.org]
>>>> Sent: Friday, May 20, 2011 10:50 AM
>>>> To: dev@mahout.apache.org
>>>> Subject: Re: Is LDA Broken?
>>>>
>>>> Likely so, see MAHOUT-694.
>>>>
>>>>
>>>> On May 20, 2011, at 1:39 PM, Sean Owen wrote:
>>>>
>>>>> Oh sorry these are the same issue? Great!
>>>>> On May 20, 2011 5:44 PM, "Jake Mannix" <ja...@gmail.com> wrote:
>>>>>> Looks like Grant got a fix posted? Has anyone else tried it?
>>>>>>
>>>>>> -jake
>>>>>>
>>>>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
>>>>>>
>>>>>>> I think we definitely need to figure out whether it's a bug or some other
>>>>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
>>>>> thing
>>>>>>> that needs a fix ASAP in which case write up all you know and everyone
>>>>> will
>>>>>>> pile in to look at it.
>>>>>>>
>>>>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>>>>>>>
>>>>>>>> Is this an issue that should be fixed before we release? It seems to be
>>>>>>>> broken to me.
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Jeff Eastman [mailto:jeastman@narus.com]
>>>>>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>>>>>> To: dev@mahout.apache.org
>>>>>>>> Subject: Is LDA Broken?
>>>>>>>>
>>>>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
>>>>> (20)
>>>>>>>> without ever producing a non-zero Log Likelihood. This is not the
>>>>>>> behavior
>>>>>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>
>>>>
>>>
>>> --------------------------------------------
>>> Grant Ingersoll
>>> Join the LUCENE REVOLUTION
>>> Lucene & Solr User Conference
>>> May 25-26, San Francisco
>>> www.lucenerevolution.org
>>>
>>
>> --------------------------------------------
>> Grant Ingersoll
>> Join the LUCENE REVOLUTION
>> Lucene & Solr User Conference
>> May 25-26, San Francisco
>> www.lucenerevolution.org
>>
>
> --------------------------
> Grant Ingersoll
> Lucene Revolution -- Lucene and Solr User Conference
> May 25-26 in San Francisco
> www.lucenerevolution.org
>
>

Re: Is LDA Broken?

Posted by Grant Ingersoll <gs...@apache.org>.
Jeff, I just put up a patch on M-694 that closes the input stream.  Does that fix it for you?  I can't repro. it here (which is weird, b/c my ulimit reports max # of files as 256)

-Grant

On May 20, 2011, at 4:09 PM, Jeff Eastman wrote:

> Don't know what is happening. I rebooted my Linux VM, did a clean mahout build, zapped bin/work, and got the same result. Will have to debug more later today...
> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org] 
> Sent: Friday, May 20, 2011 12:54 PM
> To: dev@mahout.apache.org
> Subject: Re: Is LDA Broken?
> 
> Hmm, that's weird.  I might suggest doing "clean install" first as well as deleting your examples/bin/work directory.  
> 
> On May 20, 2011, at 3:41 PM, Jeff Eastman wrote:
> 
>> I uncommented line 39 and am getting the same errors (index error with kmeans, 0 LL with LDA) as before. I am running on real clusters (CDH3 & MapR). Trying to run locally, I get this curious output. I don't have much time today to pursue it (in meetings all day) but will do my best:
>> 
>> [dev@devbox mahout]$ ./examples/bin/build-reuters.sh 
>> Please select a number to choose the corresponding clustering algorithm
>> 1. kmeans clustering
>> 2. lda clustering
>> Enter your choice : 1
>> ok. You chose 1 and we'll use kmeans Clustering
>> Downloading Reuters-21578
>> % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>>                                Dload  Upload   Total   Spent    Left  Speed
>> 100 7959k  100 7959k    0     0  1145k      0  0:00:06  0:00:06 --:--:-- 1135k
>> Extracting...
>> no HADOOP_HOME set, running locally
>> May 20, 2011 12:35:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>> WARNING: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
>> Deleting all files in ./examples/bin/work/reuters-out
>> May 20, 2011 12:35:56 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 4690 ms
>> no HADOOP_HOME set, running locally
>> May 20, 2011 12:35:57 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647, --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=./examples/bin/work/reuters-out/, --keyPrefix=, --output=./examples/bin/work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
>> Exception in thread "main" java.lang.IllegalStateException: java.io.FileNotFoundException: /home/dev/workspace/mahout/examples/bin/work/reuters-out/reut2-018.sgm-835.txt (Too many open files)
>>       at org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:79)
>>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:724)
>>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
>>       at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:76)
>>       at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>       at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>       at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
>> C
>> 
>> -----Original Message-----
>> From: Grant Ingersoll [mailto:gsingers@apache.org] 
>> Sent: Friday, May 20, 2011 11:17 AM
>> To: dev@mahout.apache.org
>> Subject: Re: Is LDA Broken?
>> 
>> yeah, sorry.  I commented out line 39: cd examples/bin
>> 
>> On May 20, 2011, at 1:58 PM, Jeff Eastman wrote:
>> 
>>> It does seem these two symptoms are of the same problem. I applied the patch; however, and now neither option runs. It appears the cd is off but I can't see where.
>>> 
>>> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh 
>>> Please select a number to choose the corresponding clustering algorithm
>>> 1. kmeans clustering
>>> 2. lda clustering
>>> Enter your choice : 1
>>> ok. You chose 1 and we'll use kmeans Clustering
>>> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or directory
>>> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or directory
>>> 
>>> 
>>> -----Original Message-----
>>> From: Grant Ingersoll [mailto:gsingers@apache.org] 
>>> Sent: Friday, May 20, 2011 10:50 AM
>>> To: dev@mahout.apache.org
>>> Subject: Re: Is LDA Broken?
>>> 
>>> Likely so, see MAHOUT-694.  
>>> 
>>> 
>>> On May 20, 2011, at 1:39 PM, Sean Owen wrote:
>>> 
>>>> Oh sorry these are the same issue? Great!
>>>> On May 20, 2011 5:44 PM, "Jake Mannix" <ja...@gmail.com> wrote:
>>>>> Looks like Grant got a fix posted? Has anyone else tried it?
>>>>> 
>>>>> -jake
>>>>> 
>>>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
>>>>> 
>>>>>> I think we definitely need to figure out whether it's a bug or some other
>>>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
>>>> thing
>>>>>> that needs a fix ASAP in which case write up all you know and everyone
>>>> will
>>>>>> pile in to look at it.
>>>>>> 
>>>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>>>>>> 
>>>>>>> Is this an issue that should be fixed before we release? It seems to be
>>>>>>> broken to me.
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Jeff Eastman [mailto:jeastman@narus.com]
>>>>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>>>>> To: dev@mahout.apache.org
>>>>>>> Subject: Is LDA Broken?
>>>>>>> 
>>>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
>>>> (20)
>>>>>>> without ever producing a non-zero Log Likelihood. This is not the
>>>>>> behavior
>>>>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>> 
>>> 
>> 
>> --------------------------------------------
>> Grant Ingersoll
>> Join the LUCENE REVOLUTION
>> Lucene & Solr User Conference
>> May 25-26, San Francisco
>> www.lucenerevolution.org
>> 
> 
> --------------------------------------------
> Grant Ingersoll
> Join the LUCENE REVOLUTION
> Lucene & Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
> 

--------------------------
Grant Ingersoll
Lucene Revolution -- Lucene and Solr User Conference
May 25-26 in San Francisco
www.lucenerevolution.org


Re: Is LDA Broken?

Posted by Grant Ingersoll <gs...@apache.org>.
Hmm, works for me on Mac.  Too many files open is a bit weird.  

From the looks of it, PrefixAdditionFilter is not closing the InputStream.

-Grant

On May 20, 2011, at 4:09 PM, Jeff Eastman wrote:

> Don't know what is happening. I rebooted my Linux VM, did a clean mahout build, zapped bin/work, and got the same result. Will have to debug more later today...
> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org] 
> Sent: Friday, May 20, 2011 12:54 PM
> To: dev@mahout.apache.org
> Subject: Re: Is LDA Broken?
> 
> Hmm, that's weird.  I might suggest doing "clean install" first as well as deleting your examples/bin/work directory.  
> 
> On May 20, 2011, at 3:41 PM, Jeff Eastman wrote:
> 
>> I uncommented line 39 and am getting the same errors (index error with kmeans, 0 LL with LDA) as before. I am running on real clusters (CDH3 & MapR). Trying to run locally, I get this curious output. I don't have much time today to pursue it (in meetings all day) but will do my best:
>> 
>> [dev@devbox mahout]$ ./examples/bin/build-reuters.sh 
>> Please select a number to choose the corresponding clustering algorithm
>> 1. kmeans clustering
>> 2. lda clustering
>> Enter your choice : 1
>> ok. You chose 1 and we'll use kmeans Clustering
>> Downloading Reuters-21578
>> % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>>                                Dload  Upload   Total   Spent    Left  Speed
>> 100 7959k  100 7959k    0     0  1145k      0  0:00:06  0:00:06 --:--:-- 1135k
>> Extracting...
>> no HADOOP_HOME set, running locally
>> May 20, 2011 12:35:51 PM org.slf4j.impl.JCLLoggerAdapter warn
>> WARNING: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
>> Deleting all files in ./examples/bin/work/reuters-out
>> May 20, 2011 12:35:56 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Program took 4690 ms
>> no HADOOP_HOME set, running locally
>> May 20, 2011 12:35:57 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647, --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=./examples/bin/work/reuters-out/, --keyPrefix=, --output=./examples/bin/work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
>> Exception in thread "main" java.lang.IllegalStateException: java.io.FileNotFoundException: /home/dev/workspace/mahout/examples/bin/work/reuters-out/reut2-018.sgm-835.txt (Too many open files)
>>       at org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:79)
>>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:724)
>>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
>>       at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:76)
>>       at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>       at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>       at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
>> C
>> 
>> -----Original Message-----
>> From: Grant Ingersoll [mailto:gsingers@apache.org] 
>> Sent: Friday, May 20, 2011 11:17 AM
>> To: dev@mahout.apache.org
>> Subject: Re: Is LDA Broken?
>> 
>> yeah, sorry.  I commented out line 39: cd examples/bin
>> 
>> On May 20, 2011, at 1:58 PM, Jeff Eastman wrote:
>> 
>>> It does seem these two symptoms are of the same problem. I applied the patch; however, and now neither option runs. It appears the cd is off but I can't see where.
>>> 
>>> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh 
>>> Please select a number to choose the corresponding clustering algorithm
>>> 1. kmeans clustering
>>> 2. lda clustering
>>> Enter your choice : 1
>>> ok. You chose 1 and we'll use kmeans Clustering
>>> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or directory
>>> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or directory
>>> 
>>> 
>>> -----Original Message-----
>>> From: Grant Ingersoll [mailto:gsingers@apache.org] 
>>> Sent: Friday, May 20, 2011 10:50 AM
>>> To: dev@mahout.apache.org
>>> Subject: Re: Is LDA Broken?
>>> 
>>> Likely so, see MAHOUT-694.  
>>> 
>>> 
>>> On May 20, 2011, at 1:39 PM, Sean Owen wrote:
>>> 
>>>> Oh sorry these are the same issue? Great!
>>>> On May 20, 2011 5:44 PM, "Jake Mannix" <ja...@gmail.com> wrote:
>>>>> Looks like Grant got a fix posted? Has anyone else tried it?
>>>>> 
>>>>> -jake
>>>>> 
>>>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
>>>>> 
>>>>>> I think we definitely need to figure out whether it's a bug or some other
>>>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
>>>> thing
>>>>>> that needs a fix ASAP in which case write up all you know and everyone
>>>> will
>>>>>> pile in to look at it.
>>>>>> 
>>>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>>>>>> 
>>>>>>> Is this an issue that should be fixed before we release? It seems to be
>>>>>>> broken to me.
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Jeff Eastman [mailto:jeastman@narus.com]
>>>>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>>>>> To: dev@mahout.apache.org
>>>>>>> Subject: Is LDA Broken?
>>>>>>> 
>>>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
>>>> (20)
>>>>>>> without ever producing a non-zero Log Likelihood. This is not the
>>>>>> behavior
>>>>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>> 
>>> 
>> 
>> --------------------------------------------
>> Grant Ingersoll
>> Join the LUCENE REVOLUTION
>> Lucene & Solr User Conference
>> May 25-26, San Francisco
>> www.lucenerevolution.org
>> 
> 
> --------------------------------------------
> Grant Ingersoll
> Join the LUCENE REVOLUTION
> Lucene & Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
> 

--------------------------
Grant Ingersoll
Lucene Revolution -- Lucene and Solr User Conference
May 25-26 in San Francisco
www.lucenerevolution.org


RE: Is LDA Broken?

Posted by Jeff Eastman <je...@Narus.com>.
Don't know what is happening. I rebooted my Linux VM, did a clean mahout build, zapped bin/work, and got the same result. Will have to debug more later today...

-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org] 
Sent: Friday, May 20, 2011 12:54 PM
To: dev@mahout.apache.org
Subject: Re: Is LDA Broken?

Hmm, that's weird.  I might suggest doing "clean install" first as well as deleting your examples/bin/work directory.  

On May 20, 2011, at 3:41 PM, Jeff Eastman wrote:

> I uncommented line 39 and am getting the same errors (index error with kmeans, 0 LL with LDA) as before. I am running on real clusters (CDH3 & MapR). Trying to run locally, I get this curious output. I don't have much time today to pursue it (in meetings all day) but will do my best:
> 
> [dev@devbox mahout]$ ./examples/bin/build-reuters.sh 
> Please select a number to choose the corresponding clustering algorithm
> 1. kmeans clustering
> 2. lda clustering
> Enter your choice : 1
> ok. You chose 1 and we'll use kmeans Clustering
> Downloading Reuters-21578
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                 Dload  Upload   Total   Spent    Left  Speed
> 100 7959k  100 7959k    0     0  1145k      0  0:00:06  0:00:06 --:--:-- 1135k
> Extracting...
> no HADOOP_HOME set, running locally
> May 20, 2011 12:35:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
> Deleting all files in ./examples/bin/work/reuters-out
> May 20, 2011 12:35:56 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 4690 ms
> no HADOOP_HOME set, running locally
> May 20, 2011 12:35:57 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647, --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=./examples/bin/work/reuters-out/, --keyPrefix=, --output=./examples/bin/work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
> Exception in thread "main" java.lang.IllegalStateException: java.io.FileNotFoundException: /home/dev/workspace/mahout/examples/bin/work/reuters-out/reut2-018.sgm-835.txt (Too many open files)
>        at org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:79)
>        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:724)
>        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
>        at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:76)
>        at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> C
> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org] 
> Sent: Friday, May 20, 2011 11:17 AM
> To: dev@mahout.apache.org
> Subject: Re: Is LDA Broken?
> 
> yeah, sorry.  I commented out line 39: cd examples/bin
> 
> On May 20, 2011, at 1:58 PM, Jeff Eastman wrote:
> 
>> It does seem these two symptoms are of the same problem. I applied the patch; however, and now neither option runs. It appears the cd is off but I can't see where.
>> 
>> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh 
>> Please select a number to choose the corresponding clustering algorithm
>> 1. kmeans clustering
>> 2. lda clustering
>> Enter your choice : 1
>> ok. You chose 1 and we'll use kmeans Clustering
>> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or directory
>> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or directory
>> 
>> 
>> -----Original Message-----
>> From: Grant Ingersoll [mailto:gsingers@apache.org] 
>> Sent: Friday, May 20, 2011 10:50 AM
>> To: dev@mahout.apache.org
>> Subject: Re: Is LDA Broken?
>> 
>> Likely so, see MAHOUT-694.  
>> 
>> 
>> On May 20, 2011, at 1:39 PM, Sean Owen wrote:
>> 
>>> Oh sorry these are the same issue? Great!
>>> On May 20, 2011 5:44 PM, "Jake Mannix" <ja...@gmail.com> wrote:
>>>> Looks like Grant got a fix posted? Has anyone else tried it?
>>>> 
>>>> -jake
>>>> 
>>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
>>>> 
>>>>> I think we definitely need to figure out whether it's a bug or some other
>>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
>>> thing
>>>>> that needs a fix ASAP in which case write up all you know and everyone
>>> will
>>>>> pile in to look at it.
>>>>> 
>>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>>>>> 
>>>>>> Is this an issue that should be fixed before we release? It seems to be
>>>>>> broken to me.
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Jeff Eastman [mailto:jeastman@narus.com]
>>>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>>>> To: dev@mahout.apache.org
>>>>>> Subject: Is LDA Broken?
>>>>>> 
>>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
>>> (20)
>>>>>> without ever producing a non-zero Log Likelihood. This is not the
>>>>> behavior
>>>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>>>> 
>>>>>> 
>>>>> 
>> 
>> 
> 
> --------------------------------------------
> Grant Ingersoll
> Join the LUCENE REVOLUTION
> Lucene & Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
> 

--------------------------------------------
Grant Ingersoll
Join the LUCENE REVOLUTION
Lucene & Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org


Re: Is LDA Broken?

Posted by Grant Ingersoll <gs...@apache.org>.
Hmm, that's weird.  I might suggest doing "clean install" first as well as deleting your examples/bin/work directory.  

On May 20, 2011, at 3:41 PM, Jeff Eastman wrote:

> I uncommented line 39 and am getting the same errors (index error with kmeans, 0 LL with LDA) as before. I am running on real clusters (CDH3 & MapR). Trying to run locally, I get this curious output. I don't have much time today to pursue it (in meetings all day) but will do my best:
> 
> [dev@devbox mahout]$ ./examples/bin/build-reuters.sh 
> Please select a number to choose the corresponding clustering algorithm
> 1. kmeans clustering
> 2. lda clustering
> Enter your choice : 1
> ok. You chose 1 and we'll use kmeans Clustering
> Downloading Reuters-21578
>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                 Dload  Upload   Total   Spent    Left  Speed
> 100 7959k  100 7959k    0     0  1145k      0  0:00:06  0:00:06 --:--:-- 1135k
> Extracting...
> no HADOOP_HOME set, running locally
> May 20, 2011 12:35:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> WARNING: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
> Deleting all files in ./examples/bin/work/reuters-out
> May 20, 2011 12:35:56 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Program took 4690 ms
> no HADOOP_HOME set, running locally
> May 20, 2011 12:35:57 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647, --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=./examples/bin/work/reuters-out/, --keyPrefix=, --output=./examples/bin/work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
> Exception in thread "main" java.lang.IllegalStateException: java.io.FileNotFoundException: /home/dev/workspace/mahout/examples/bin/work/reuters-out/reut2-018.sgm-835.txt (Too many open files)
>        at org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:79)
>        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:724)
>        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
>        at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:76)
>        at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> C
> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org] 
> Sent: Friday, May 20, 2011 11:17 AM
> To: dev@mahout.apache.org
> Subject: Re: Is LDA Broken?
> 
> yeah, sorry.  I commented out line 39: cd examples/bin
> 
> On May 20, 2011, at 1:58 PM, Jeff Eastman wrote:
> 
>> It does seem these two symptoms are of the same problem. I applied the patch; however, and now neither option runs. It appears the cd is off but I can't see where.
>> 
>> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh 
>> Please select a number to choose the corresponding clustering algorithm
>> 1. kmeans clustering
>> 2. lda clustering
>> Enter your choice : 1
>> ok. You chose 1 and we'll use kmeans Clustering
>> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or directory
>> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or directory
>> 
>> 
>> -----Original Message-----
>> From: Grant Ingersoll [mailto:gsingers@apache.org] 
>> Sent: Friday, May 20, 2011 10:50 AM
>> To: dev@mahout.apache.org
>> Subject: Re: Is LDA Broken?
>> 
>> Likely so, see MAHOUT-694.  
>> 
>> 
>> On May 20, 2011, at 1:39 PM, Sean Owen wrote:
>> 
>>> Oh sorry these are the same issue? Great!
>>> On May 20, 2011 5:44 PM, "Jake Mannix" <ja...@gmail.com> wrote:
>>>> Looks like Grant got a fix posted? Has anyone else tried it?
>>>> 
>>>> -jake
>>>> 
>>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
>>>> 
>>>>> I think we definitely need to figure out whether it's a bug or some other
>>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
>>> thing
>>>>> that needs a fix ASAP in which case write up all you know and everyone
>>> will
>>>>> pile in to look at it.
>>>>> 
>>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>>>>> 
>>>>>> Is this an issue that should be fixed before we release? It seems to be
>>>>>> broken to me.
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Jeff Eastman [mailto:jeastman@narus.com]
>>>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>>>> To: dev@mahout.apache.org
>>>>>> Subject: Is LDA Broken?
>>>>>> 
>>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
>>> (20)
>>>>>> without ever producing a non-zero Log Likelihood. This is not the
>>>>> behavior
>>>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>>>> 
>>>>>> 
>>>>> 
>> 
>> 
> 
> --------------------------------------------
> Grant Ingersoll
> Join the LUCENE REVOLUTION
> Lucene & Solr User Conference
> May 25-26, San Francisco
> www.lucenerevolution.org
> 

--------------------------------------------
Grant Ingersoll
Join the LUCENE REVOLUTION
Lucene & Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org


RE: Is LDA Broken?

Posted by Jeff Eastman <je...@Narus.com>.
I uncommented line 39 and am getting the same errors (index error with kmeans, 0 LL with LDA) as before. I am running on real clusters (CDH3 & MapR). Trying to run locally, I get this curious output. I don't have much time today to pursue it (in meetings all day) but will do my best:

[dev@devbox mahout]$ ./examples/bin/build-reuters.sh 
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. lda clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
Downloading Reuters-21578
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 7959k  100 7959k    0     0  1145k      0  0:00:06  0:00:06 --:--:-- 1135k
Extracting...
no HADOOP_HOME set, running locally
May 20, 2011 12:35:51 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: No org.apache.lucene.benchmark.utils.ExtractReuters.props found on classpath, will use command-line arguments only
Deleting all files in ./examples/bin/work/reuters-out
May 20, 2011 12:35:56 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 4690 ms
no HADOOP_HOME set, running locally
May 20, 2011 12:35:57 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Command line arguments: {--charset=UTF-8, --chunkSize=5, --endPhase=2147483647, --fileFilterClass=org.apache.mahout.text.PrefixAdditionFilter, --input=./examples/bin/work/reuters-out/, --keyPrefix=, --output=./examples/bin/work/reuters-out-seqdir, --startPhase=0, --tempDir=temp}
Exception in thread "main" java.lang.IllegalStateException: java.io.FileNotFoundException: /home/dev/workspace/mahout/examples/bin/work/reuters-out/reut2-018.sgm-835.txt (Too many open files)
        at org.apache.mahout.text.SequenceFilesFromDirectoryFilter.accept(SequenceFilesFromDirectoryFilter.java:79)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:724)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:746)
        at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:76)
        at org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:106)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:81)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
C

-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org] 
Sent: Friday, May 20, 2011 11:17 AM
To: dev@mahout.apache.org
Subject: Re: Is LDA Broken?

yeah, sorry.  I commented out line 39: cd examples/bin

On May 20, 2011, at 1:58 PM, Jeff Eastman wrote:

> It does seem these two symptoms are of the same problem. I applied the patch; however, and now neither option runs. It appears the cd is off but I can't see where.
> 
> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh 
> Please select a number to choose the corresponding clustering algorithm
> 1. kmeans clustering
> 2. lda clustering
> Enter your choice : 1
> ok. You chose 1 and we'll use kmeans Clustering
> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or directory
> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or directory
> 
> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org] 
> Sent: Friday, May 20, 2011 10:50 AM
> To: dev@mahout.apache.org
> Subject: Re: Is LDA Broken?
> 
> Likely so, see MAHOUT-694.  
> 
> 
> On May 20, 2011, at 1:39 PM, Sean Owen wrote:
> 
>> Oh sorry these are the same issue? Great!
>> On May 20, 2011 5:44 PM, "Jake Mannix" <ja...@gmail.com> wrote:
>>> Looks like Grant got a fix posted? Has anyone else tried it?
>>> 
>>> -jake
>>> 
>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
>>> 
>>>> I think we definitely need to figure out whether it's a bug or some other
>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
>> thing
>>>> that needs a fix ASAP in which case write up all you know and everyone
>> will
>>>> pile in to look at it.
>>>> 
>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>>>> 
>>>>> Is this an issue that should be fixed before we release? It seems to be
>>>>> broken to me.
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Jeff Eastman [mailto:jeastman@narus.com]
>>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>>> To: dev@mahout.apache.org
>>>>> Subject: Is LDA Broken?
>>>>> 
>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
>> (20)
>>>>> without ever producing a non-zero Log Likelihood. This is not the
>>>> behavior
>>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>>> 
>>>>> 
>>>> 
> 
> 

--------------------------------------------
Grant Ingersoll
Join the LUCENE REVOLUTION
Lucene & Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org


Re: Is LDA Broken?

Posted by Grant Ingersoll <gs...@apache.org>.
yeah, sorry.  I commented out line 39: cd examples/bin

On May 20, 2011, at 1:58 PM, Jeff Eastman wrote:

> It does seem these two symptoms are of the same problem. I applied the patch; however, and now neither option runs. It appears the cd is off but I can't see where.
> 
> [dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh 
> Please select a number to choose the corresponding clustering algorithm
> 1. kmeans clustering
> 2. lda clustering
> Enter your choice : 1
> ok. You chose 1 and we'll use kmeans Clustering
> ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or directory
> ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or directory
> 
> 
> -----Original Message-----
> From: Grant Ingersoll [mailto:gsingers@apache.org] 
> Sent: Friday, May 20, 2011 10:50 AM
> To: dev@mahout.apache.org
> Subject: Re: Is LDA Broken?
> 
> Likely so, see MAHOUT-694.  
> 
> 
> On May 20, 2011, at 1:39 PM, Sean Owen wrote:
> 
>> Oh sorry these are the same issue? Great!
>> On May 20, 2011 5:44 PM, "Jake Mannix" <ja...@gmail.com> wrote:
>>> Looks like Grant got a fix posted? Has anyone else tried it?
>>> 
>>> -jake
>>> 
>>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
>>> 
>>>> I think we definitely need to figure out whether it's a bug or some other
>>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
>> thing
>>>> that needs a fix ASAP in which case write up all you know and everyone
>> will
>>>> pile in to look at it.
>>>> 
>>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>>>> 
>>>>> Is this an issue that should be fixed before we release? It seems to be
>>>>> broken to me.
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Jeff Eastman [mailto:jeastman@narus.com]
>>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>>> To: dev@mahout.apache.org
>>>>> Subject: Is LDA Broken?
>>>>> 
>>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
>> (20)
>>>>> without ever producing a non-zero Log Likelihood. This is not the
>>>> behavior
>>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>>> 
>>>>> 
>>>> 
> 
> 

--------------------------------------------
Grant Ingersoll
Join the LUCENE REVOLUTION
Lucene & Solr User Conference
May 25-26, San Francisco
www.lucenerevolution.org


RE: Is LDA Broken?

Posted by Jeff Eastman <je...@Narus.com>.
It does seem these two symptoms are of the same problem. I applied the patch; however, and now neither option runs. It appears the cd is off but I can't see where.

[dev@devbox mahout-distribution-0.5]$ time ./examples/bin/build-reuters.sh 
Please select a number to choose the corresponding clustering algorithm
1. kmeans clustering
2. lda clustering
Enter your choice : 1
ok. You chose 1 and we'll use kmeans Clustering
./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or directory
./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or directory


-----Original Message-----
From: Grant Ingersoll [mailto:gsingers@apache.org] 
Sent: Friday, May 20, 2011 10:50 AM
To: dev@mahout.apache.org
Subject: Re: Is LDA Broken?

Likely so, see MAHOUT-694.  


On May 20, 2011, at 1:39 PM, Sean Owen wrote:

> Oh sorry these are the same issue? Great!
> On May 20, 2011 5:44 PM, "Jake Mannix" <ja...@gmail.com> wrote:
>> Looks like Grant got a fix posted? Has anyone else tried it?
>> 
>> -jake
>> 
>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
>> 
>>> I think we definitely need to figure out whether it's a bug or some other
>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
> thing
>>> that needs a fix ASAP in which case write up all you know and everyone
> will
>>> pile in to look at it.
>>> 
>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>>> 
>>>> Is this an issue that should be fixed before we release? It seems to be
>>>> broken to me.
>>>> 
>>>> -----Original Message-----
>>>> From: Jeff Eastman [mailto:jeastman@narus.com]
>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>> To: dev@mahout.apache.org
>>>> Subject: Is LDA Broken?
>>>> 
>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
> (20)
>>>> without ever producing a non-zero Log Likelihood. This is not the
>>> behavior
>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>> 
>>>> 
>>> 



Re: Is LDA Broken?

Posted by Grant Ingersoll <gs...@apache.org>.
Likely so, see MAHOUT-694.  


On May 20, 2011, at 1:39 PM, Sean Owen wrote:

> Oh sorry these are the same issue? Great!
> On May 20, 2011 5:44 PM, "Jake Mannix" <ja...@gmail.com> wrote:
>> Looks like Grant got a fix posted? Has anyone else tried it?
>> 
>> -jake
>> 
>> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
>> 
>>> I think we definitely need to figure out whether it's a bug or some other
>>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
> thing
>>> that needs a fix ASAP in which case write up all you know and everyone
> will
>>> pile in to look at it.
>>> 
>>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>>> 
>>>> Is this an issue that should be fixed before we release? It seems to be
>>>> broken to me.
>>>> 
>>>> -----Original Message-----
>>>> From: Jeff Eastman [mailto:jeastman@narus.com]
>>>> Sent: Thursday, May 19, 2011 5:05 PM
>>>> To: dev@mahout.apache.org
>>>> Subject: Is LDA Broken?
>>>> 
>>>> I'm running build-reuters option 2 and the LDA runs to maxIterations
> (20)
>>>> without ever producing a non-zero Log Likelihood. This is not the
>>> behavior
>>>> that I recall from earlier runs and seems quite unlikely to be correct.
>>>> 
>>>> 
>>> 



Re: Is LDA Broken?

Posted by Sean Owen <sr...@gmail.com>.
Oh sorry these are the same issue? Great!
On May 20, 2011 5:44 PM, "Jake Mannix" <ja...@gmail.com> wrote:
> Looks like Grant got a fix posted? Has anyone else tried it?
>
> -jake
>
> On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:
>
>> I think we definitely need to figure out whether it's a bug or some other
>> confusion. If it's a doesn't-work-at-all bug yes probably the kind of
thing
>> that needs a fix ASAP in which case write up all you know and everyone
will
>> pile in to look at it.
>>
>> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>>
>> > Is this an issue that should be fixed before we release? It seems to be
>> > broken to me.
>> >
>> > -----Original Message-----
>> > From: Jeff Eastman [mailto:jeastman@narus.com]
>> > Sent: Thursday, May 19, 2011 5:05 PM
>> > To: dev@mahout.apache.org
>> > Subject: Is LDA Broken?
>> >
>> > I'm running build-reuters option 2 and the LDA runs to maxIterations
(20)
>> > without ever producing a non-zero Log Likelihood. This is not the
>> behavior
>> > that I recall from earlier runs and seems quite unlikely to be correct.
>> >
>> >
>>

Re: Is LDA Broken?

Posted by Jake Mannix <ja...@gmail.com>.
Looks like Grant got a fix posted?  Has anyone else tried it?

  -jake

On Fri, May 20, 2011 at 9:32 AM, Sean Owen <sr...@gmail.com> wrote:

> I think we definitely need to figure out whether it's a bug or some other
> confusion. If it's a doesn't-work-at-all bug yes probably the kind of thing
> that needs a fix ASAP in which case write up all you know and everyone will
> pile in to look at it.
>
> On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:
>
> > Is this an issue that should be fixed before we release? It seems to be
> > broken to me.
> >
> > -----Original Message-----
> > From: Jeff Eastman [mailto:jeastman@narus.com]
> > Sent: Thursday, May 19, 2011 5:05 PM
> > To: dev@mahout.apache.org
> > Subject: Is LDA Broken?
> >
> > I'm running build-reuters option 2 and the LDA runs to maxIterations (20)
> > without ever producing a non-zero Log Likelihood. This is not the
> behavior
> > that I recall from earlier runs and seems quite unlikely to be correct.
> >
> >
>

Re: Is LDA Broken?

Posted by Sean Owen <sr...@gmail.com>.
I think we definitely need to figure out whether it's a bug or some other
confusion. If it's a doesn't-work-at-all bug yes probably the kind of thing
that needs a fix ASAP in which case write up all you know and everyone will
pile in to look at it.

On Fri, May 20, 2011 at 5:29 PM, Jeff Eastman <je...@narus.com> wrote:

> Is this an issue that should be fixed before we release? It seems to be
> broken to me.
>
> -----Original Message-----
> From: Jeff Eastman [mailto:jeastman@narus.com]
> Sent: Thursday, May 19, 2011 5:05 PM
> To: dev@mahout.apache.org
> Subject: Is LDA Broken?
>
> I'm running build-reuters option 2 and the LDA runs to maxIterations (20)
> without ever producing a non-zero Log Likelihood. This is not the behavior
> that I recall from earlier runs and seems quite unlikely to be correct.
>
>

RE: Is LDA Broken?

Posted by Jeff Eastman <je...@Narus.com>.
Is this an issue that should be fixed before we release? It seems to be broken to me.

-----Original Message-----
From: Jeff Eastman [mailto:jeastman@narus.com] 
Sent: Thursday, May 19, 2011 5:05 PM
To: dev@mahout.apache.org
Subject: Is LDA Broken?

I'm running build-reuters option 2 and the LDA runs to maxIterations (20) without ever producing a non-zero Log Likelihood. This is not the behavior that I recall from earlier runs and seems quite unlikely to be correct.