You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Isabel Drost (JIRA)" <ji...@apache.org> on 2008/04/08 16:23:25 UTC

[jira] Created: (MAHOUT-31) Implementation of PLSI that uses EM

Implementation of PLSI that uses EM
-----------------------------------

                 Key: MAHOUT-31
                 URL: https://issues.apache.org/jira/browse/MAHOUT-31
             Project: Mahout
          Issue Type: New Feature
            Reporter: Isabel Drost


This should implement the proposal in the original Google Paper on PLSI in news retrieval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589162#action_12589162 ] 

Grant Ingersoll commented on MAHOUT-31:
---------------------------------------

I totally agree with Doug and Ian about patent reading, etc.  

That being said, I think this one is sufficiently tainted and that we should not do anything with it without legal permission from the patent holder at this point b/c it would be pretty darn difficult to say you weren't aware of _something_ existing, if it did indeed prove out to apply.   I, for one, am not interested in the time it takes to get legal permission (even though I can safely declare that I have not read the link above), so in my view, unfortunately, this one is dead.

> Implementation of PLSI that uses EM
> -----------------------------------
>
>                 Key: MAHOUT-31
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-31
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Isabel Drost
>
> This should implement the proposal in the original Google Paper on PLSI in news retrieval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589137#action_12589137 ] 

Doug Cutting commented on MAHOUT-31:
------------------------------------

> you or any other apache guy are NOT qualified to judge if a patent applies

+1, nor whether its owner objects to your use of any patent.

I would generally discourage folks from doing patent research when implementing Apache code.  It is usually both a waste of time and dangerous, since it opens you to the possibility of treble damages.  In particular, if you are involved in patching this issue, please do not read the above cited patent.

A patent holder may tell us if they believe we have infringed their patents.  We should generally wait for that event, and not pro-actively seek permission.


> Implementation of PLSI that uses EM
> -----------------------------------
>
>                 Key: MAHOUT-31
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-31
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Isabel Drost
>
> This should implement the proposal in the original Google Paper on PLSI in news retrieval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by "Karl Wettin (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587450#action_12587450 ] 

Karl Wettin commented on MAHOUT-31:
-----------------------------------

You can take this up on legal-discuss and see what they say.

http://mail-archives.apache.org/mod_mbox/www-legal-discuss/

> Implementation of PLSI that uses EM
> -----------------------------------
>
>                 Key: MAHOUT-31
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-31
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Isabel Drost
>
> This should implement the proposal in the original Google Paper on PLSI in news retrieval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by "Sean Owen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved MAHOUT-31.
-----------------------------

    Resolution: Won't Fix

> Implementation of PLSI that uses EM
> -----------------------------------
>
>                 Key: MAHOUT-31
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-31
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Isabel Drost
>
> This should implement the proposal in the original Google Paper on PLSI in news retrieval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by Ted Dunning <td...@veoh.com>.
You can take a look at Buntine and Jakulin, Discrete Components Analysis.

http://cosco.hiit.fi/Articles/buntineBohinj.pdf

Note that the optimizations are nice, but the basic Gibbs sampler is
actually pretty simple.  My first implementation was about 20 lines of R and
it wasn¹t actually so slow for training.  With a decent cluster, the simpler
algorithms may actually be better.

(btw, if I include the full reference, I get kicked out with high spam
score.  Apparently names with initials or something is bad).


On 4/15/08 7:22 PM, "Ian Holsman" <li...@holsman.net> wrote:

> Ted Dunning wrote:
>> We should consider changing algorithms.
>> 
>> MDCA is a good candidate.  So would be nested Dirchlet processes.  Neither
>> of these is necessarily all that much more difficult to implement than PLSI
>> and both should give better results.
>> 
>>   
> Hi Ted.
> can you give me a pointer to something that describes MDCA ? all the
> things google finds is behind a paywall.
> 
> regards
> Ian
> 


Re: [jira] Commented: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by Ian Holsman <li...@holsman.net>.
Ted Dunning wrote:
> We should consider changing algorithms.
>
> MDCA is a good candidate.  So would be nested Dirchlet processes.  Neither
> of these is necessarily all that much more difficult to implement than PLSI
> and both should give better results.
>
>   
Hi Ted.
can you give me a pointer to something that describes MDCA ? all the 
things google finds is behind a paywall.

regards
Ian


RE: [jira] Commented: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by "Goel, Ankur" <An...@corp.aol.com>.
+1 for changing the algorithms. The safest and most non-disruptive
course would be to simply
leave the issue (or better mark it 'Won't fix') and change the
algorithms.

I haven't taken a look at the MDCA so can't comment.
I will take a look at the Drichilet Process implementation that Ted
attached as soon as I find
some time.

-----Original Message-----
From: Ted Dunning [mailto:tdunning@veoh.com] 
Sent: Wednesday, April 16, 2008 1:27 AM
To: mahout-dev@lucene.apache.org
Subject: Re: [jira] Commented: (MAHOUT-31) Implementation of PLSI that
uses EM


We should consider changing algorithms.

MDCA is a good candidate.  So would be nested Dirchlet processes.
Neither of these is necessarily all that much more difficult to
implement than PLSI and both should give better results.


On 4/15/08 12:52 PM, "Grant Ingersoll (JIRA)" <ji...@apache.org> wrote:

> 
>     [
> https://issues.apache.org/jira/browse/MAHOUT-31?page=com.atlassian.jir
> a.plugin
> .system.issuetabpanels:comment-tabpanel&focusedCommentId=12589206#acti
> on_12589
> 206 ]
> 
> Grant Ingersoll commented on MAHOUT-31:
> ---------------------------------------
> 
> My bad, I thought there was a patch here.  I just want to avoid the 
> case of someone who has knowledge that they think they are infringing 
> and still puts up a patch.
> 
> So, in that case, I am fine if someone other than Ankur takes it up 
> (or who works with Ankur, I think).  I just am a bit paranoid since we

> are so early stage, I don't want anything to derail the positive 
> momentum we have going here.
> 
>> Implementation of PLSI that uses EM
>> -----------------------------------
>> 
>>                 Key: MAHOUT-31
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-31
>>             Project: Mahout
>>          Issue Type: New Feature
>>            Reporter: Isabel Drost
>> 
>> This should implement the proposal in the original Google Paper on 
>> PLSI in news retrieval.


Re: [jira] Commented: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by Ted Dunning <td...@veoh.com>.
We should consider changing algorithms.

MDCA is a good candidate.  So would be nested Dirchlet processes.  Neither
of these is necessarily all that much more difficult to implement than PLSI
and both should give better results.


On 4/15/08 12:52 PM, "Grant Ingersoll (JIRA)" <ji...@apache.org> wrote:

> 
>     [ 
> https://issues.apache.org/jira/browse/MAHOUT-31?page=com.atlassian.jira.plugin
> .system.issuetabpanels:comment-tabpanel&focusedCommentId=12589206#action_12589
> 206 ] 
> 
> Grant Ingersoll commented on MAHOUT-31:
> ---------------------------------------
> 
> My bad, I thought there was a patch here.  I just want to avoid the case of
> someone who has knowledge that they think they are infringing and still puts
> up a patch.
> 
> So, in that case, I am fine if someone other than Ankur takes it up (or who
> works with Ankur, I think).  I just am a bit paranoid since we are so early
> stage, I don't want anything to derail the positive momentum we have going
> here.
> 
>> Implementation of PLSI that uses EM
>> -----------------------------------
>> 
>>                 Key: MAHOUT-31
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-31
>>             Project: Mahout
>>          Issue Type: New Feature
>>            Reporter: Isabel Drost
>> 
>> This should implement the proposal in the original Google Paper on PLSI in
>> news retrieval.


[jira] Commented: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589206#action_12589206 ] 

Grant Ingersoll commented on MAHOUT-31:
---------------------------------------

My bad, I thought there was a patch here.  I just want to avoid the case of someone who has knowledge that they think they are infringing and still puts up a patch.

So, in that case, I am fine if someone other than Ankur takes it up (or who works with Ankur, I think).  I just am a bit paranoid since we are so early stage, I don't want anything to derail the positive momentum we have going here.

> Implementation of PLSI that uses EM
> -----------------------------------
>
>                 Key: MAHOUT-31
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-31
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Isabel Drost
>
> This should implement the proposal in the original Google Paper on PLSI in news retrieval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by "Ian Holsman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587423#action_12587423 ] 

Ian Holsman commented on MAHOUT-31:
-----------------------------------

please note.
you or any other apache guy are NOT qualified to judge if a patent applies or doesn't apply to a specific piece of code.
it is a LEGAL thing, not a technical issue.

If you want to pursue this the easiest way forward is for google (the patent owner) to give the ASF a license. 

> Implementation of PLSI that uses EM
> -----------------------------------
>
>                 Key: MAHOUT-31
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-31
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Isabel Drost
>
> This should implement the proposal in the original Google Paper on PLSI in news retrieval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586818#action_12586818 ] 

Ankur commented on MAHOUT-31:
-----------------------------

If I am correct then this looks like the right place for the first patch on Mahout-4.

> Implementation of PLSI that uses EM
> -----------------------------------
>
>                 Key: MAHOUT-31
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-31
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Isabel Drost
>
> This should implement the proposal in the original Google Paper on PLSI in news retrieval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589170#action_12589170 ] 

Doug Cutting commented on MAHOUT-31:
------------------------------------

> we should not do anything with it without legal permission

By that argument, we'd soon do nothing with any patch.  One can easily search patent databases and find patents that might read on nearly any patch.

I personally have no idea whether that patent reads on this issue.  We don't even have a patch here yet!  Ankur has read a patent that he thinks may read on something that has yet to be implemented.  So I think it would be a mistake for Ankur to implement this.  But beyond that, I don't see a need to restrict our actions here.



> Implementation of PLSI that uses EM
> -----------------------------------
>
>                 Key: MAHOUT-31
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-31
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Isabel Drost
>
> This should implement the proposal in the original Google Paper on PLSI in news retrieval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by "Ankur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587137#action_12587137 ] 

Ankur commented on MAHOUT-31:
-----------------------------

Before work can progressed on this I would like to put a word of CAUTION here.  PLSI using EM is a subset of the work Google did for its news personalization and there has been a patent awarded to Google for it. The details of which can be found at 

http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.html&r=1&f=G&l=50&s1=%2220070038659%22.PGNR.&OS=DN/20070038659&RS=DN/20070038659

I am not sure if  work on this issue would infringe the patent. 

May be someone else having a better understanding of these things can provide some clarification.

> Implementation of PLSI that uses EM
> -----------------------------------
>
>                 Key: MAHOUT-31
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-31
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Isabel Drost
>
> This should implement the proposal in the original Google Paper on PLSI in news retrieval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-31) Implementation of PLSI that uses EM

Posted by "Isabel Drost (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587255#action_12587255 ] 

Isabel Drost commented on MAHOUT-31:
------------------------------------

I think that is an important point - it also applies to PLSI itself although I think the patent for that does not belong to Google...

Maybe some of the Mahout guys with more Apache experience can help out here? 

> Implementation of PLSI that uses EM
> -----------------------------------
>
>                 Key: MAHOUT-31
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-31
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Isabel Drost
>
> This should implement the proposal in the original Google Paper on PLSI in news retrieval.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.