You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Matt Mitchell <go...@gmail.com> on 2013/08/30 15:12:21 UTC

mahout hadoop recommenders - how to evaluate?

Hi,

I thought I asked this question once before but couldn't find the thread.
Is there an out-of-the-box way to evaluate the hadoop/offline
recommendation/similarity data? I found an article showing how to do it
with the parallelALS recommender, but not the recommenditembased (for
example).

Matt

Re: mahout hadoop recommenders - how to evaluate?

Posted by Pat Ferrel <pa...@occamsmachete.com>.
Not as far as I know. There are a bunch of issues to consider that make it difficult to do out of the box. 

We did a time based split for test/training hold-out. trained on 90% of older data and ran precision based MAP on the newer held-out data. The timestamp is not part of the mahout data flow and so this would be impossible out of the box.

That said, I sure wish we had random hold out precision tests. These are included with the in-memory versions and if you can run your data through them you will get virtually identical results from my experience. There are many caveat's that apply to testing recommenders but given an understanding of them the tests are quite valuable. For instance MAP lift does not necessarily produce user benefit. A/B tests cannot be replaced by offline tests. We use them to do rapid iterations and think of them as a sort of heavy-weight unit test.

On Aug 30, 2013, at 6:12 AM, Matt Mitchell <go...@gmail.com> wrote:

Hi,

I thought I asked this question once before but couldn't find the thread.
Is there an out-of-the-box way to evaluate the hadoop/offline
recommendation/similarity data? I found an article showing how to do it
with the parallelALS recommender, but not the recommenditembased (for
example).

Matt