You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Venkatesh <vr...@aol.com> on 2010/08/15 03:38:14 UTC

LDA topic prediction on new document

 

 Hi
Using LDA in mahout, how do I estimate topic distribution on new document using previously obtained
topics.
many thanks
venkatesh



Re: LDA topic prediction on new document

Posted by Venkatesh <vr...@aol.com>.
 Thanks Grant.
I was looking something similiar to PLSA fold-in, which I'm familiar with..where u keep
topics/words & asociated probabilities fixed & re-run EM on new set of documents

 
Venkatesh


 

 

-----Original Message-----
From: Grant Ingersoll <gs...@apache.org>
To: user@mahout.apache.org
Sent: Mon, Aug 16, 2010 9:14 am
Subject: Re: LDA topic prediction on new document


Hi Venkatesh,

As far as I know, there isn't support for incremental runs at this time.  I 
guess what I would probably do, depending on my computing resources, is wait for 
some number of documents to build up since the last run and then just rerun LDA.  
Others might have more insight, though.

-Grant

On Aug 14, 2010, at 9:38 PM, Venkatesh wrote:

> 
> 
> 
> Hi
> Using LDA in mahout, how do I estimate topic distribution on new document 
using previously obtained
> topics.
> many thanks
> venkatesh
> 
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search

 

Re: LDA topic prediction on new document

Posted by Grant Ingersoll <gs...@apache.org>.
Hi Venkatesh,

As far as I know, there isn't support for incremental runs at this time.  I guess what I would probably do, depending on my computing resources, is wait for some number of documents to build up since the last run and then just rerun LDA.  Others might have more insight, though.

-Grant

On Aug 14, 2010, at 9:38 PM, Venkatesh wrote:

> 
> 
> 
> Hi
> Using LDA in mahout, how do I estimate topic distribution on new document using previously obtained
> topics.
> many thanks
> venkatesh
> 
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search


Re: FileDataModel

Posted by Sean Owen <sr...@gmail.com>.
Ah yeah I see the problem now. I'll fix that.

On Sun, Aug 15, 2010 at 5:00 PM, Tamas Jambor <ja...@gmail.com> wrote:
> DataModel model = new FileDataModel(new File("./data/test.txt"));
> //just to make sure it loads the model
> model.getNumItems();
> System.out.println(model.getMaxPreference());
>
> this prints out a NaN

Re: FileDataModel

Posted by Tamas Jambor <ja...@gmail.com>.
DataModel model = new FileDataModel(new File("./data/test.txt"));
//just to make sure it loads the model
model.getNumItems();
System.out.println(model.getMaxPreference());

this prints out a NaN

because you have maxPreference/minPreference calculated when it creates 
the inner DataModel (a variable called delegate), but it is not in the 
wrapper class FileDataModel.

On 15/08/2010 21:54, Sean Owen wrote:
> What do you mean by this? I'm not clear yet.
>
> On Sun, Aug 15, 2010 at 1:09 PM, Tamas Jambor<ja...@gmail.com>  wrote:
>    
>> Hi,
>>
>> One more possible bug, in FileDataModel, there is nothing to make sure that
>> the superclass - AbstractDataModel gets the value for maxPreference and
>> minPreference.
>>
>> Tamas
>>
>>      


Re: FileDataModel

Posted by Sean Owen <sr...@gmail.com>.
What do you mean by this? I'm not clear yet.

On Sun, Aug 15, 2010 at 1:09 PM, Tamas Jambor <ja...@gmail.com> wrote:
> Hi,
>
> One more possible bug, in FileDataModel, there is nothing to make sure that
> the superclass - AbstractDataModel gets the value for maxPreference and
> minPreference.
>
> Tamas
>

FileDataModel

Posted by Tamas Jambor <ja...@gmail.com>.
Hi,

One more possible bug, in FileDataModel, there is nothing to make sure 
that the superclass - AbstractDataModel gets the value for maxPreference 
and minPreference.

Tamas

Re: getAllOtherItems

Posted by Sean Owen <sr...@gmail.com>.
(True, the SVD's real benefit is that it can build more user-user
and/or item-item connections by squeezing the data down into many
fewer dimensions. It's making more items be co-rated in a sense.)

On Sun, Aug 15, 2010 at 1:18 PM, Tamas Jambor <ja...@gmail.com> wrote:
> On 15/08/2010 18:29, Sebastian Schelter wrote:
>>
>> What talk about is actually not a bug, if there are no co-rated items
>> it's absolutely logical that nothing can be recommended. In a real world
>> application you would maybe want to show the user the latest or overall
>> top-rated items as a workaround (just to show them something).
>>
>
> Thanks, it's true for neighbourhood based algorithms, but SVD could still
> give you results.
>
> Tamas
>

Re: getAllOtherItems

Posted by Ted Dunning <te...@gmail.com>.
This is definitely true and in a few very sparse problems, these might even
be usable recommendations.

In general, having some cooccurrence is a useful heuristic.  If you find it
useful, you can replace the candidate strategy.

A good example of an SVD-like algorithm that could produce strong
recommendations with no-cooccurrence, see here:
http://arxiv.org/abs/1006.2156

We don't much support side information in Mahout so this the candidate set
problem associated with these models isn't an issue yet, but at some point
using these models might be good.  The primary benefit would be better
performance in cold-start situations.

On Sun, Aug 15, 2010 at 11:18 AM, Tamas Jambor <ja...@gmail.com> wrote:

> On 15/08/2010 18:29, Sebastian Schelter wrote:
>
>> What talk about is actually not a bug, if there are no co-rated items
>> it's absolutely logical that nothing can be recommended. In a real world
>> application you would maybe want to show the user the latest or overall
>> top-rated items as a workaround (just to show them something).
>>
>>
>
> Thanks, it's true for neighbourhood based algorithms, but SVD could still
> give you results.
>
> Tamas
>

Re: getAllOtherItems

Posted by Tamas Jambor <ja...@gmail.com>.
On 15/08/2010 18:29, Sebastian Schelter wrote:
> What talk about is actually not a bug, if there are no co-rated items
> it's absolutely logical that nothing can be recommended. In a real world
> application you would maybe want to show the user the latest or overall
> top-rated items as a workaround (just to show them something).
>    

Thanks, it's true for neighbourhood based algorithms, but SVD could 
still give you results.

Tamas

Re: getAllOtherItems

Posted by Sebastian Schelter <ss...@apache.org>.
Hi Tamas,

are you looking at the current trunk of Mahout 0.4? We made the method
use a customizable "CandidateItemsStrategy" implementation lately. I
guess you're talking about
PreferredItemsNeighborhoodCandidateItemsStrategy then which is the
default implementation.

What talk about is actually not a bug, if there are no co-rated items
it's absolutely logical that nothing can be recommended. In a real world
application you would maybe want to show the user the latest or overall
top-rated items as a workaround (just to show them something).

--sebastian

Am 15.08.2010 19:22, schrieb Tamas Jambor:
> hi,
>
> I have just spotted a possible bug in AbstractRecommender, the
> getAllOtherItems only gets items if there is at least one co-rated
> item between the target user and other users. For example if there are
> only two users in the system and they didn't co-rate any items than
> this method returns nothing.
>
> Tamas


Re: getAllOtherItems

Posted by Grant Ingersoll <gs...@apache.org>.
From http://people.apache.org/~hossman/#threadhijack

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.

See Also:  
http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking


On Aug 15, 2010, at 1:22 PM, Tamas Jambor wrote:

> hi,
> 
> I have just spotted a possible bug in AbstractRecommender, the getAllOtherItems only gets items if there is at least one co-rated item between the target user and other users. For example if there are only two users in the system and they didn't co-rate any items than this method returns nothing.
> 
> Tamas



getAllOtherItems

Posted by Tamas Jambor <ja...@gmail.com>.
hi,

I have just spotted a possible bug in AbstractRecommender, the 
getAllOtherItems only gets items if there is at least one co-rated item 
between the target user and other users. For example if there are only 
two users in the system and they didn't co-rate any items than this 
method returns nothing.

Tamas