You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Jeff Eastman <jd...@windwardsolutions.com> on 2010/05/10 20:35:20 UTC

Wiki and Safari

Anybody else have problems viewing our Wiki using Safari? The pictures 
on this page don't render for me and I also cannot see many of the 
{code} blocks on other pages.  Firefox seems to be just fine.

On 5/10/10 7:39 AM, Robin Anil wrote:
> https://cwiki.apache.org/confluence/display/MAHOUT/k-Means
>
>

Re: Clustering users

Posted by Sean Owen <sr...@gmail.com>.

I believe those jobs will internally create whatever they need along
the way, including user vectors if needed.

To just create them by themselves, you could run ToItemPrefsMapper and
ToUserVectorReducer from org.apache.mahout.cf.taste.hadoop.item.


On Tue, May 11, 2010 at 5:51 AM, First Qaxy <qa...@yahoo.ca> wrote:
> Hello,
> Just started looking into clustering - KMeansDriver - and have a question on clustering users. Considering that overall I have a huge number of items, how do I create the vectors for the users? Is there any code in Mahout to support that or do I need to write it by turning all userN, itemM pairs (boolean pref) into a userN vector ? I'm not exactly sure - how would this vector look like? Also at the end of the clustering I need to show the original user. Is that possible?Any pointers would be great. Thanks,-qf
>
>

Re: Clustering users

Posted by Ted Dunning <te...@gmail.com>.

Generally, you want to do a bit of projection on these data before
clustering.

One option is random projection.  This maps each item to a sparse binary
vector based on a few independent hashes of the original item id.  This
gives you are moderate dimensional vector to do clustering in (say 100,000
dimensions instead of the original gazillion).

The other option is SVD.  With a gazillion columns in your matrix, you may
want to do the random projection trick first and then do an SVD.  The
resulting 10-30 dimensional representation for users is likely to cluster
much better than the original data.

The random projection you would need to implement.  The SVD can be done once
you have Mahout vectors to play with.

On Mon, May 10, 2010 at 9:51 PM, First Qaxy <qa...@yahoo.ca> wrote:

> Considering that overall I have a huge number of items, how do I create the
> vectors for the users? Is there any code in Mahout to support that or do I
> need to write it by turning all userN, itemM pairs (boolean pref) into a
> userN vector ?

Re: Clustering users

Posted by Jeff Eastman <jd...@windwardsolutions.com>.

In general, you will need to write a custom Mapper that accepts whatever 
format your user data is in and converts it to Mahout Vectors in a 
preprocessing job. Look at the Synthetic Control clustering jobs in 
/examples/ for an instance in the Canopy package. Once you have your 
input data as a sequence file [key=Text(userId); 
value=VectorWritable(prefs)] you can run the KMeansDriver - or any of 
the other clustering jobs - on it directly.

The user vectors would indeed be constructed out of the M items for each 
user e.g. user-i = [item-i0, item-i1, ..., item-NM]. If you wrap the 
user vector in a NamedVectorWritable you can attach the userId as the 
vector name and it will pass through the clustering and out the end in 
the clusteredPoints. Just map your boolean preferences to 0 and 1; a 
ManhattanDistanceMeasure would be a good place to start too.

When you are done, you will have a 'clusteredPoints' directory with more 
sequence files [key=IntWritable(clusterId); value=VectorWritable(prefs)] 
which you can feed into subsequent processing or output with the 
ClusterDumper. Have fun and let me know if you need any more hints.

Jeff

On 5/10/10 9:51 PM, First Qaxy wrote:
> Hello,
> Just started looking into clustering - KMeansDriver - and have a question on clustering users. Considering that overall I have a huge number of items, how do I create the vectors for the users? Is there any code in Mahout to support that or do I need to write it by turning all userN, itemM pairs (boolean pref) into a userN vector ? I'm not exactly sure - how would this vector look like? Also at the end of the clustering I need to show the original user. Is that possible?Any pointers would be great. Thanks,-qf
>
>
>

Re: RecommenderJob output

Posted by Sean Owen <sr...@gmail.com>.

The values are entries in the final recommendation vector. They don't
have a good interpretation by themselves, but larger values should
mean better recommendation. So the recommendations are ordered by this
value. It's included just in case it is useful. In other recommender
systems (like .pseudo), this would be the actual estimated preference.

However I don't immediately see why the result would be negative
infinity, ever. I'd have to look into that.

On Tue, May 11, 2010 at 7:11 AM, First Qaxy <qa...@yahoo.ca> wrote:
> Hello,
> When running the RecommenderJob with --booleanData false on this input:101,1001101,1002101,1003101,1004101,1005102,1002102,1003103,1002103,1003103,1004105,1001105,1002105,1003105,1004105,1015106,1002106,1003106,1004106,1020106,1021
> the output that I'm getting has:
> 101     [1015:4.0,1021:3.0,1020:3.0,1005:-Infinity,1004:-Infinity,1003:-Infinity,1001:-Infinity,1002:-Infinity]102      [1004:10.0,1005:8.0,1020:2.0,1021:2.0,1015:2.0,1003:-Infinity,1002:-Infinity]103        [1005:12.0,1021:3.0,1020:3.0,1015:3.0,1004:-Infinity,1002:-Infinity,1003:-Infinity]105  [1005:14.0,1020:3.0,1021:3.0,1015:-Infinity,1004:-Infinity,1003:-Infinity,1001:-Infinity,1002:-Infinity]106     [1005:12.0,1021:4.0,1015:3.0,1004:-Infinity,1002:-Infinity,1003:-Infinity,1020:-Infinity]
> What is the meaning(formula) of the float number? > 101 [1015:4.0 <= what is 4.0 ?
> Thanks, -qf
>
>

Re: RecommenderJob output

Posted by First Qaxy <qa...@yahoo.ca>.

> the training model. I'm interesting in the "model" deployed in production, not just for the purpose of training. 

err, I meant to say : not just for the purpose of *testing*.
--- On Tue, 5/11/10, First Qaxy <qa...@yahoo.ca> wrote:

From: First Qaxy <qa...@yahoo.ca>
Subject: Re: RecommenderJob output
To: user@mahout.apache.org
Received: Tuesday, May 11, 2010, 10:01 PM

Great info. No, I'm not looking into having multiple active processes trying to update it. It's more of a single worker process that needs to update the "model" as new data becomes available (every few hours, days,... depending on the customer needs). Ideally I should be able to tell which users were affected so only their recommendations would end up being updated back to Solr. I am getting closer to the end of the evaluation process of Mahout and will soon proceed with the implementation, at which point I hope I'll be able to provide better feedback and contribute more.

On a different thread - I have a high level / best practices question: When doing clustering or classification with large datasets - is the expectation that the algorithms would run on the whole data set available or a (carefully selected) sub set i.e. the training model. I'm interesting in the "model" deployed in production, not just for the purpose of training. 
If the answer is - a sub set - what is usually a good size relative to the full data set and how do people approach this in order to get a representative smaller set?
-qf
--- On Tue, 5/11/10, Sean Owen <sr...@gmail.com> wrote:

From: Sean Owen <sr...@gmail.com>
Subject: Re: RecommenderJob output
To: user@mahout.apache.org
Received: Tuesday, May 11, 2010, 5:08 PM

Can you update it while it's running? Not really. It's a multi-phase
batch job and I don't think you could meaningfully change it on the
fly.

Do you need to run the whole thing every time? No, not at all. Phase 1
(item IDs to item indices) doesn't need to run every time, nor does
phase 3 (count co-occurrence). It's OK if these are a little out of
date. Phase 2 is user vector generation; while I didn't write any
ability to simply append a new user vector to its output, it's easy to
write. So you don't have to run that every time.

Phase 4 and 5 are really where the recommendation happens. Those go
together. You can limit which users it processes though with a file of
user IDs, --usersFile.

I'd say the core job is nearing maturity -- think it's tuned and
debugged. But these kind of practical hooks, like being able to
incrementally update aspects of the pipeline, are exactly what's
needed next. I'd welcome your input and patches in this regard.

Sean

On Tue, May 11, 2010 at 10:00 PM, First Qaxy <qa...@yahoo.ca> wrote:
> One question on the recommendation lifecycle: once a RecommendationJob is being run with the intermediate/temp model being created what is the process of maintaining it? Can I update it or parts of it to reflect new data?
> For example if I have a new user or new preferences for an existing user that I want to compute recommendation for can I do that by incrementally update the internal model and regenerate only recommendations for the user that I'm interested in?
>
> Thanks.
> -qf
> --- On Tue, 5/11/10, Sean Owen <sr...@gmail.com> wrote:
>
> From: Sean Owen <sr...@gmail.com>
> Subject: Re: RecommenderJob output
> To: user@mahout.apache.org
> Cc: mahout-user@lucene.apache.org
> Received: Tuesday, May 11, 2010, 3:55 AM
>
> I just committed more of my local changes, since I'm actively
> improving and fixing things here.
>
> My output looks more reasonable:
>
> 101     [1015:4.0,1021:3.0,1020:3.0]
> 102     [1004:10.0,1005:8.0,1021:2.0,1020:2.0,1015:2.0]
> 103     [1005:12.0,1021:3.0,1015:3.0,1020:3.0]
> 105     [1005:14.0,1021:3.0,1020:3.0]
> 106     [1005:12.0,1021:4.0,1015:3.0]
>
> So you might just try the code from head. booleanData doesn't really
> affect the output, it just enables optimizations for this case.
>
>
>

Re: RecommenderJob output

Posted by First Qaxy <qa...@yahoo.ca>.

Great info. No, I'm not looking into having multiple active processes trying to update it. It's more of a single worker process that needs to update the "model" as new data becomes available (every few hours, days,... depending on the customer needs). Ideally I should be able to tell which users were affected so only their recommendations would end up being updated back to Solr. I am getting closer to the end of the evaluation process of Mahout and will soon proceed with the implementation, at which point I hope I'll be able to provide better feedback and contribute more.

On a different thread - I have a high level / best practices question: When doing clustering or classification with large datasets - is the expectation that the algorithms would run on the whole data set available or a (carefully selected) sub set i.e. the training model. I'm interesting in the "model" deployed in production, not just for the purpose of training. 
If the answer is - a sub set - what is usually a good size relative to the full data set and how do people approach this in order to get a representative smaller set?
-qf
--- On Tue, 5/11/10, Sean Owen <sr...@gmail.com> wrote:

From: Sean Owen <sr...@gmail.com>
Subject: Re: RecommenderJob output
To: user@mahout.apache.org
Received: Tuesday, May 11, 2010, 5:08 PM

Can you update it while it's running? Not really. It's a multi-phase
batch job and I don't think you could meaningfully change it on the
fly.

Do you need to run the whole thing every time? No, not at all. Phase 1
(item IDs to item indices) doesn't need to run every time, nor does
phase 3 (count co-occurrence). It's OK if these are a little out of
date. Phase 2 is user vector generation; while I didn't write any
ability to simply append a new user vector to its output, it's easy to
write. So you don't have to run that every time.

Phase 4 and 5 are really where the recommendation happens. Those go
together. You can limit which users it processes though with a file of
user IDs, --usersFile.

I'd say the core job is nearing maturity -- think it's tuned and
debugged. But these kind of practical hooks, like being able to
incrementally update aspects of the pipeline, are exactly what's
needed next. I'd welcome your input and patches in this regard.

Sean

On Tue, May 11, 2010 at 10:00 PM, First Qaxy <qa...@yahoo.ca> wrote:
> One question on the recommendation lifecycle: once a RecommendationJob is being run with the intermediate/temp model being created what is the process of maintaining it? Can I update it or parts of it to reflect new data?
> For example if I have a new user or new preferences for an existing user that I want to compute recommendation for can I do that by incrementally update the internal model and regenerate only recommendations for the user that I'm interested in?
>
> Thanks.
> -qf
> --- On Tue, 5/11/10, Sean Owen <sr...@gmail.com> wrote:
>
> From: Sean Owen <sr...@gmail.com>
> Subject: Re: RecommenderJob output
> To: user@mahout.apache.org
> Cc: mahout-user@lucene.apache.org
> Received: Tuesday, May 11, 2010, 3:55 AM
>
> I just committed more of my local changes, since I'm actively
> improving and fixing things here.
>
> My output looks more reasonable:
>
> 101     [1015:4.0,1021:3.0,1020:3.0]
> 102     [1004:10.0,1005:8.0,1021:2.0,1020:2.0,1015:2.0]
> 103     [1005:12.0,1021:3.0,1015:3.0,1020:3.0]
> 105     [1005:14.0,1021:3.0,1020:3.0]
> 106     [1005:12.0,1021:4.0,1015:3.0]
>
> So you might just try the code from head. booleanData doesn't really
> affect the output, it just enables optimizations for this case.
>
>
>

Re: RecommenderJob output

Posted by Sean Owen <sr...@gmail.com>.

Can you update it while it's running? Not really. It's a multi-phase
batch job and I don't think you could meaningfully change it on the
fly.

Do you need to run the whole thing every time? No, not at all. Phase 1
(item IDs to item indices) doesn't need to run every time, nor does
phase 3 (count co-occurrence). It's OK if these are a little out of
date. Phase 2 is user vector generation; while I didn't write any
ability to simply append a new user vector to its output, it's easy to
write. So you don't have to run that every time.

Phase 4 and 5 are really where the recommendation happens. Those go
together. You can limit which users it processes though with a file of
user IDs, --usersFile.

I'd say the core job is nearing maturity -- think it's tuned and
debugged. But these kind of practical hooks, like being able to
incrementally update aspects of the pipeline, are exactly what's
needed next. I'd welcome your input and patches in this regard.

Sean

On Tue, May 11, 2010 at 10:00 PM, First Qaxy <qa...@yahoo.ca> wrote:
> One question on the recommendation lifecycle: once a RecommendationJob is being run with the intermediate/temp model being created what is the process of maintaining it? Can I update it or parts of it to reflect new data?
> For example if I have a new user or new preferences for an existing user that I want to compute recommendation for can I do that by incrementally update the internal model and regenerate only recommendations for the user that I'm interested in?
>
> Thanks.
> -qf
> --- On Tue, 5/11/10, Sean Owen <sr...@gmail.com> wrote:
>
> From: Sean Owen <sr...@gmail.com>
> Subject: Re: RecommenderJob output
> To: user@mahout.apache.org
> Cc: mahout-user@lucene.apache.org
> Received: Tuesday, May 11, 2010, 3:55 AM
>
> I just committed more of my local changes, since I'm actively
> improving and fixing things here.
>
> My output looks more reasonable:
>
> 101     [1015:4.0,1021:3.0,1020:3.0]
> 102     [1004:10.0,1005:8.0,1021:2.0,1020:2.0,1015:2.0]
> 103     [1005:12.0,1021:3.0,1015:3.0,1020:3.0]
> 105     [1005:14.0,1021:3.0,1020:3.0]
> 106     [1005:12.0,1021:4.0,1015:3.0]
>
> So you might just try the code from head. booleanData doesn't really
> affect the output, it just enables optimizations for this case.
>
>
>

Re: RecommenderJob output

Posted by First Qaxy <qa...@yahoo.ca>.

Thanks, I've tested it and it did stop showing the -Infinity values.

-qf
--- On Tue, 5/11/10, Sean Owen <sr...@gmail.com> wrote:

From: Sean Owen <sr...@gmail.com>
Subject: Re: RecommenderJob output
To: user@mahout.apache.org
Cc: mahout-user@lucene.apache.org
Received: Tuesday, May 11, 2010, 3:55 AM

I just committed more of my local changes, since I'm actively
improving and fixing things here.

My output looks more reasonable:

101     [1015:4.0,1021:3.0,1020:3.0]
102     [1004:10.0,1005:8.0,1021:2.0,1020:2.0,1015:2.0]
103     [1005:12.0,1021:3.0,1015:3.0,1020:3.0]
105     [1005:14.0,1021:3.0,1020:3.0]
106     [1005:12.0,1021:4.0,1015:3.0]

So you might just try the code from head. booleanData doesn't really
affect the output, it just enables optimizations for this case.

Re: RecommenderJob output

Posted by First Qaxy <qa...@yahoo.ca>.

One question on the recommendation lifecycle: once a RecommendationJob is being run with the intermediate/temp model being created what is the process of maintaining it? Can I update it or parts of it to reflect new data?
For example if I have a new user or new preferences for an existing user that I want to compute recommendation for can I do that by incrementally update the internal model and regenerate only recommendations for the user that I'm interested in?

Thanks.
-qf
--- On Tue, 5/11/10, Sean Owen <sr...@gmail.com> wrote:

From: Sean Owen <sr...@gmail.com>
Subject: Re: RecommenderJob output
To: user@mahout.apache.org
Cc: mahout-user@lucene.apache.org
Received: Tuesday, May 11, 2010, 3:55 AM

I just committed more of my local changes, since I'm actively
improving and fixing things here.

My output looks more reasonable:

101     [1015:4.0,1021:3.0,1020:3.0]
102     [1004:10.0,1005:8.0,1021:2.0,1020:2.0,1015:2.0]
103     [1005:12.0,1021:3.0,1015:3.0,1020:3.0]
105     [1005:14.0,1021:3.0,1020:3.0]
106     [1005:12.0,1021:4.0,1015:3.0]

So you might just try the code from head. booleanData doesn't really
affect the output, it just enables optimizations for this case.

Re: RecommenderJob output

Posted by Sean Owen <sr...@gmail.com>.

I just committed more of my local changes, since I'm actively
improving and fixing things here.

My output looks more reasonable:

101     [1015:4.0,1021:3.0,1020:3.0]
102     [1004:10.0,1005:8.0,1021:2.0,1020:2.0,1015:2.0]
103     [1005:12.0,1021:3.0,1015:3.0,1020:3.0]
105     [1005:14.0,1021:3.0,1020:3.0]
106     [1005:12.0,1021:4.0,1015:3.0]

So you might just try the code from head. booleanData doesn't really
affect the output, it just enables optimizations for this case.

Re: RecommenderJob output

Posted by First Qaxy <qa...@yahoo.ca>.

Sorry, typed the wrong thing - yes, it is true in fact.

--- On Tue, 5/11/10, Sean Owen <sr...@gmail.com> wrote:

From: Sean Owen <sr...@gmail.com>
Subject: Re: RecommenderJob output
To: user@mahout.apache.org
Cc: mahout-user@lucene.apache.org
Received: Tuesday, May 11, 2010, 3:23 AM

Er, wait why are you setting booleanData = false? Though the
formatting got messed up here, it looks like you do not have explicit
ratings. So you should set to true..

On Tue, May 11, 2010 at 7:11 AM, First Qaxy <qa...@yahoo.ca> wrote:
> Hello,
> When running the RecommenderJob with --booleanData false on this input:101,1001101,1002101,1003101,1004101,1005102,1002102,1003103,1002103,1003103,1004105,1001105,1002105,1003105,1004105,1015106,1002106,1003106,1004106,1020106,1021
> the output that I'm getting has:
> 101     [1015:4.0,1021:3.0,1020:3.0,1005:-Infinity,1004:-Infinity,1003:-Infinity,1001:-Infinity,1002:-Infinity]102      [1004:10.0,1005:8.0,1020:2.0,1021:2.0,1015:2.0,1003:-Infinity,1002:-Infinity]103        [1005:12.0,1021:3.0,1020:3.0,1015:3.0,1004:-Infinity,1002:-Infinity,1003:-Infinity]105  [1005:14.0,1020:3.0,1021:3.0,1015:-Infinity,1004:-Infinity,1003:-Infinity,1001:-Infinity,1002:-Infinity]106     [1005:12.0,1021:4.0,1015:3.0,1004:-Infinity,1002:-Infinity,1003:-Infinity,1020:-Infinity]
> What is the meaning(formula) of the float number? > 101 [1015:4.0 <= what is 4.0 ?
> Thanks, -qf
>
>

Re: RecommenderJob output

Posted by Sean Owen <sr...@gmail.com>.

Er, wait why are you setting booleanData = false? Though the
formatting got messed up here, it looks like you do not have explicit
ratings. So you should set to true..

On Tue, May 11, 2010 at 7:11 AM, First Qaxy <qa...@yahoo.ca> wrote:
> Hello,
> When running the RecommenderJob with --booleanData false on this input:101,1001101,1002101,1003101,1004101,1005102,1002102,1003103,1002103,1003103,1004105,1001105,1002105,1003105,1004105,1015106,1002106,1003106,1004106,1020106,1021
> the output that I'm getting has:
> 101     [1015:4.0,1021:3.0,1020:3.0,1005:-Infinity,1004:-Infinity,1003:-Infinity,1001:-Infinity,1002:-Infinity]102      [1004:10.0,1005:8.0,1020:2.0,1021:2.0,1015:2.0,1003:-Infinity,1002:-Infinity]103        [1005:12.0,1021:3.0,1020:3.0,1015:3.0,1004:-Infinity,1002:-Infinity,1003:-Infinity]105  [1005:14.0,1020:3.0,1021:3.0,1015:-Infinity,1004:-Infinity,1003:-Infinity,1001:-Infinity,1002:-Infinity]106     [1005:12.0,1021:4.0,1015:3.0,1004:-Infinity,1002:-Infinity,1003:-Infinity,1020:-Infinity]
> What is the meaning(formula) of the float number? > 101 [1015:4.0 <= what is 4.0 ?
> Thanks, -qf
>
>

RE: Thread Hijacking Re: RecommenderJob output

Posted by Grant Ingersoll <gs...@apache.org>.

On May 11, 2010, at 8:15 AM, Sean Owen wrote:

> (Did that happen? I only see my three replies to the original message
> -- sure, maybe that could have been one -- but all were directly
> relevant to the first message.)
> 
> (Or is this somehow looking connected to another thread because it
> shares the same subject? didn't happen for me in Gmail at least)

There were actually a few hijacks on this thread, AFAICT.  This one, the Safari one and the Clustering one.   If you look at the full headers, you'll see either a common message-id or a common reply-to header, which causes some (but not all) mail clients to automatically group those threads.

-Grant

Re: RecommenderJob output

Posted by Sean Owen <sr...@gmail.com>.

(Did that happen? I only see my three replies to the original message
-- sure, maybe that could have been one -- but all were directly
relevant to the first message.)

(Or is this somehow looking connected to another thread because it
shares the same subject? didn't happen for me in Gmail at least)

On Tue, May 11, 2010 at 1:10 PM, Grant Ingersoll <gs...@apache.org> wrote:
> Please, when starting a new thread, start a new message.

Re: RecommenderJob output

Posted by First Qaxy <qa...@yahoo.ca>.

Hi Grant,
I wasn't aware of that. Thanks. I'll do that going forward.
-qf

--- On Tue, 5/11/10, Grant Ingersoll <gs...@apache.org> wrote:

From: Grant Ingersoll <gs...@apache.org>
Subject: Re: RecommenderJob output
To: user@mahout.apache.org
Cc: mahout-user@lucene.apache.org
Received: Tuesday, May 11, 2010, 8:10 AM

Please, when starting a new thread, start a new message.  

See http://people.apache.org/~hossman/#threadhijack
<snip>
When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  
http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking
</snip>

On May 11, 2010, at 2:11 AM, First Qaxy wrote:

> Hello,
> When running the RecommenderJob with --booleanData false on this input:101,1001101,1002101,1003101,1004101,1005102,1002102,1003103,1002103,1003103,1004105,1001105,1002105,1003105,1004105,1015106,1002106,1003106,1004106,1020106,1021
> the output that I'm getting has:
> 101    [1015:4.0,1021:3.0,1020:3.0,1005:-Infinity,1004:-Infinity,1003:-Infinity,1001:-Infinity,1002:-Infinity]102    [1004:10.0,1005:8.0,1020:2.0,1021:2.0,1015:2.0,1003:-Infinity,1002:-Infinity]103    [1005:12.0,1021:3.0,1020:3.0,1015:3.0,1004:-Infinity,1002:-Infinity,1003:-Infinity]105    [1005:14.0,1020:3.0,1021:3.0,1015:-Infinity,1004:-Infinity,1003:-Infinity,1001:-Infinity,1002:-Infinity]106    [1005:12.0,1021:4.0,1015:3.0,1004:-Infinity,1002:-Infinity,1003:-Infinity,1020:-Infinity]
> What is the meaning(formula) of the float number? > 101    [1015:4.0 <= what is 4.0 ?
> Thanks, -qf
>

Re: RecommenderJob output

Posted by Grant Ingersoll <gs...@apache.org>.

Please, when starting a new thread, start a new message.  

See http://people.apache.org/~hossman/#threadhijack
<snip>
When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  
http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking
</snip>

On May 11, 2010, at 2:11 AM, First Qaxy wrote:

> Hello,
> When running the RecommenderJob with --booleanData false on this input:101,1001101,1002101,1003101,1004101,1005102,1002102,1003103,1002103,1003103,1004105,1001105,1002105,1003105,1004105,1015106,1002106,1003106,1004106,1020106,1021
> the output that I'm getting has:
> 101	[1015:4.0,1021:3.0,1020:3.0,1005:-Infinity,1004:-Infinity,1003:-Infinity,1001:-Infinity,1002:-Infinity]102	[1004:10.0,1005:8.0,1020:2.0,1021:2.0,1015:2.0,1003:-Infinity,1002:-Infinity]103	[1005:12.0,1021:3.0,1020:3.0,1015:3.0,1004:-Infinity,1002:-Infinity,1003:-Infinity]105	[1005:14.0,1020:3.0,1021:3.0,1015:-Infinity,1004:-Infinity,1003:-Infinity,1001:-Infinity,1002:-Infinity]106	[1005:12.0,1021:4.0,1015:3.0,1004:-Infinity,1002:-Infinity,1003:-Infinity,1020:-Infinity]
> What is the meaning(formula) of the float number? > 101	[1015:4.0 <= what is 4.0 ?
> Thanks, -qf
>

RecommenderJob output

Posted by First Qaxy <qa...@yahoo.ca>.

Hello,
When running the RecommenderJob with --booleanData false on this input:101,1001101,1002101,1003101,1004101,1005102,1002102,1003103,1002103,1003103,1004105,1001105,1002105,1003105,1004105,1015106,1002106,1003106,1004106,1020106,1021
the output that I'm getting has:
101	[1015:4.0,1021:3.0,1020:3.0,1005:-Infinity,1004:-Infinity,1003:-Infinity,1001:-Infinity,1002:-Infinity]102	[1004:10.0,1005:8.0,1020:2.0,1021:2.0,1015:2.0,1003:-Infinity,1002:-Infinity]103	[1005:12.0,1021:3.0,1020:3.0,1015:3.0,1004:-Infinity,1002:-Infinity,1003:-Infinity]105	[1005:14.0,1020:3.0,1021:3.0,1015:-Infinity,1004:-Infinity,1003:-Infinity,1001:-Infinity,1002:-Infinity]106	[1005:12.0,1021:4.0,1015:3.0,1004:-Infinity,1002:-Infinity,1003:-Infinity,1020:-Infinity]
What is the meaning(formula) of the float number? > 101	[1015:4.0 <= what is 4.0 ?
Thanks, -qf

Clustering users

Posted by First Qaxy <qa...@yahoo.ca>.

Hello,
Just started looking into clustering - KMeansDriver - and have a question on clustering users. Considering that overall I have a huge number of items, how do I create the vectors for the users? Is there any code in Mahout to support that or do I need to write it by turning all userN, itemM pairs (boolean pref) into a userN vector ? I'm not exactly sure - how would this vector look like? Also at the end of the clustering I need to show the original user. Is that possible?Any pointers would be great. Thanks,-qf

Re: Wiki and Safari

Posted by Ted Dunning <te...@gmail.com>.

Renders fine for me using Chrome

On Mon, May 10, 2010 at 11:35 AM, Jeff Eastman
<jd...@windwardsolutions.com>wrote:

> Anybody else have problems viewing our Wiki using Safari? The pictures on
> this page don't render for me and I also cannot see many of the {code}
> blocks on other pages.  Firefox seems to be just fine.
>
>
> On 5/10/10 7:39 AM, Robin Anil wrote:
>
>> https://cwiki.apache.org/confluence/display/MAHOUT/k-Means
>>
>>
>>
>
>

Re: Wiki and Safari

Posted by Sebastian Schelter <se...@zalando.de>.

Looks fine for me in Safari on OS X 10.4.11

2010/5/10 Jeff Eastman <jd...@windwardsolutions.com>

> Anybody else have problems viewing our Wiki using Safari? The pictures on
> this page don't render for me and I also cannot see many of the {code}
> blocks on other pages.  Firefox seems to be just fine.
>
>
> On 5/10/10 7:39 AM, Robin Anil wrote:
>
>> https://cwiki.apache.org/confluence/display/MAHOUT/k-Means
>>
>>
>>
>
>