You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Venkatesh <vr...@aol.com> on 2010/08/15 03:38:14 UTC
LDA topic prediction on new document
Hi
Using LDA in mahout, how do I estimate topic distribution on new document using previously obtained
topics.
many thanks
venkatesh
Re: LDA topic prediction on new document
Posted by Venkatesh <vr...@aol.com>.
Thanks Grant.
I was looking something similiar to PLSA fold-in, which I'm familiar with..where u keep
topics/words & asociated probabilities fixed & re-run EM on new set of documents
Venkatesh
-----Original Message-----
From: Grant Ingersoll <gs...@apache.org>
To: user@mahout.apache.org
Sent: Mon, Aug 16, 2010 9:14 am
Subject: Re: LDA topic prediction on new document
Hi Venkatesh,
As far as I know, there isn't support for incremental runs at this time. I
guess what I would probably do, depending on my computing resources, is wait for
some number of documents to build up since the last run and then just rerun LDA.
Others might have more insight, though.
-Grant
On Aug 14, 2010, at 9:38 PM, Venkatesh wrote:
>
>
>
> Hi
> Using LDA in mahout, how do I estimate topic distribution on new document
using previously obtained
> topics.
> many thanks
> venkatesh
>
>
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: LDA topic prediction on new document
Posted by Grant Ingersoll <gs...@apache.org>.
Hi Venkatesh,
As far as I know, there isn't support for incremental runs at this time. I guess what I would probably do, depending on my computing resources, is wait for some number of documents to build up since the last run and then just rerun LDA. Others might have more insight, though.
-Grant
On Aug 14, 2010, at 9:38 PM, Venkatesh wrote:
>
>
>
> Hi
> Using LDA in mahout, how do I estimate topic distribution on new document using previously obtained
> topics.
> many thanks
> venkatesh
>
>
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
Re: FileDataModel
Posted by Sean Owen <sr...@gmail.com>.
Ah yeah I see the problem now. I'll fix that.
On Sun, Aug 15, 2010 at 5:00 PM, Tamas Jambor <ja...@gmail.com> wrote:
> DataModel model = new FileDataModel(new File("./data/test.txt"));
> //just to make sure it loads the model
> model.getNumItems();
> System.out.println(model.getMaxPreference());
>
> this prints out a NaN
Re: FileDataModel
Posted by Tamas Jambor <ja...@gmail.com>.
DataModel model = new FileDataModel(new File("./data/test.txt"));
//just to make sure it loads the model
model.getNumItems();
System.out.println(model.getMaxPreference());
this prints out a NaN
because you have maxPreference/minPreference calculated when it creates
the inner DataModel (a variable called delegate), but it is not in the
wrapper class FileDataModel.
On 15/08/2010 21:54, Sean Owen wrote:
> What do you mean by this? I'm not clear yet.
>
> On Sun, Aug 15, 2010 at 1:09 PM, Tamas Jambor<ja...@gmail.com> wrote:
>
>> Hi,
>>
>> One more possible bug, in FileDataModel, there is nothing to make sure that
>> the superclass - AbstractDataModel gets the value for maxPreference and
>> minPreference.
>>
>> Tamas
>>
>>
Re: FileDataModel
Posted by Sean Owen <sr...@gmail.com>.
What do you mean by this? I'm not clear yet.
On Sun, Aug 15, 2010 at 1:09 PM, Tamas Jambor <ja...@gmail.com> wrote:
> Hi,
>
> One more possible bug, in FileDataModel, there is nothing to make sure that
> the superclass - AbstractDataModel gets the value for maxPreference and
> minPreference.
>
> Tamas
>
FileDataModel
Posted by Tamas Jambor <ja...@gmail.com>.
Hi,
One more possible bug, in FileDataModel, there is nothing to make sure
that the superclass - AbstractDataModel gets the value for maxPreference
and minPreference.
Tamas
Re: getAllOtherItems
Posted by Sean Owen <sr...@gmail.com>.
(True, the SVD's real benefit is that it can build more user-user
and/or item-item connections by squeezing the data down into many
fewer dimensions. It's making more items be co-rated in a sense.)
On Sun, Aug 15, 2010 at 1:18 PM, Tamas Jambor <ja...@gmail.com> wrote:
> On 15/08/2010 18:29, Sebastian Schelter wrote:
>>
>> What talk about is actually not a bug, if there are no co-rated items
>> it's absolutely logical that nothing can be recommended. In a real world
>> application you would maybe want to show the user the latest or overall
>> top-rated items as a workaround (just to show them something).
>>
>
> Thanks, it's true for neighbourhood based algorithms, but SVD could still
> give you results.
>
> Tamas
>
Re: getAllOtherItems
Posted by Ted Dunning <te...@gmail.com>.
This is definitely true and in a few very sparse problems, these might even
be usable recommendations.
In general, having some cooccurrence is a useful heuristic. If you find it
useful, you can replace the candidate strategy.
A good example of an SVD-like algorithm that could produce strong
recommendations with no-cooccurrence, see here:
http://arxiv.org/abs/1006.2156
We don't much support side information in Mahout so this the candidate set
problem associated with these models isn't an issue yet, but at some point
using these models might be good. The primary benefit would be better
performance in cold-start situations.
On Sun, Aug 15, 2010 at 11:18 AM, Tamas Jambor <ja...@gmail.com> wrote:
> On 15/08/2010 18:29, Sebastian Schelter wrote:
>
>> What talk about is actually not a bug, if there are no co-rated items
>> it's absolutely logical that nothing can be recommended. In a real world
>> application you would maybe want to show the user the latest or overall
>> top-rated items as a workaround (just to show them something).
>>
>>
>
> Thanks, it's true for neighbourhood based algorithms, but SVD could still
> give you results.
>
> Tamas
>
Re: getAllOtherItems
Posted by Tamas Jambor <ja...@gmail.com>.
On 15/08/2010 18:29, Sebastian Schelter wrote:
> What talk about is actually not a bug, if there are no co-rated items
> it's absolutely logical that nothing can be recommended. In a real world
> application you would maybe want to show the user the latest or overall
> top-rated items as a workaround (just to show them something).
>
Thanks, it's true for neighbourhood based algorithms, but SVD could
still give you results.
Tamas
Re: getAllOtherItems
Posted by Sebastian Schelter <ss...@apache.org>.
Hi Tamas,
are you looking at the current trunk of Mahout 0.4? We made the method
use a customizable "CandidateItemsStrategy" implementation lately. I
guess you're talking about
PreferredItemsNeighborhoodCandidateItemsStrategy then which is the
default implementation.
What talk about is actually not a bug, if there are no co-rated items
it's absolutely logical that nothing can be recommended. In a real world
application you would maybe want to show the user the latest or overall
top-rated items as a workaround (just to show them something).
--sebastian
Am 15.08.2010 19:22, schrieb Tamas Jambor:
> hi,
>
> I have just spotted a possible bug in AbstractRecommender, the
> getAllOtherItems only gets items if there is at least one co-rated
> item between the target user and other users. For example if there are
> only two users in the system and they didn't co-rate any items than
> this method returns nothing.
>
> Tamas
Re: getAllOtherItems
Posted by Grant Ingersoll <gs...@apache.org>.
From http://people.apache.org/~hossman/#threadhijack
When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email. Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is "hidden" in that thread and gets less
attention. It makes following discussions in the mailing list archives
particularly difficult.
See Also:
http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking
On Aug 15, 2010, at 1:22 PM, Tamas Jambor wrote:
> hi,
>
> I have just spotted a possible bug in AbstractRecommender, the getAllOtherItems only gets items if there is at least one co-rated item between the target user and other users. For example if there are only two users in the system and they didn't co-rate any items than this method returns nothing.
>
> Tamas
getAllOtherItems
Posted by Tamas Jambor <ja...@gmail.com>.
hi,
I have just spotted a possible bug in AbstractRecommender, the
getAllOtherItems only gets items if there is at least one co-rated item
between the target user and other users. For example if there are only
two users in the system and they didn't co-rate any items than this
method returns nothing.
Tamas