You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by han henry <hu...@gmail.com> on 2010/08/23 15:57:26 UTC

adding feature:skip user's non-interested items when generate recommendation for user.

Hi,All

Sometimes user's dislikes some recommendation we generated ,he/she does not
want to see the recommended items again.

Here is a example from Amazon.com (see the attachment ).

I have written one patch for it.the logic as following :

1) Dump user's non-interested items to HDFS, format like userId+"_"+item_id.
before we run RecommenderJob
2) Load user's invalid data to HashMap when AggregateAndRecommendReducer
setup
3) Skip user's non-interested items when choose TOP N recommendations for
user.

Does it make sense and can merge to the repository ?

Re: adding feature:skip user's non-interested items when generate recommendation for user.

Posted by Sebastian Schelter <ss...@apache.org>.
I don't think a MapFile is a good solution as the file would have to be
accessed for every Reducer invocation to load the filter items for that
user. Correct me if I'm wrong.

--sebastian

Am 24.08.2010 15:45, schrieb han henry:
> For 1) , user's invalid items can store in multiple files, we use use
> MapFilesMap to load the data from HDFS,
> then we can check the invalid items.
>
> package org.apache.mahout.cf.taste.hadoop;
>
> import java.io.Closeable;
> import java.io.IOException;
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.FileStatus;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.PathFilter;
> import org.apache.hadoop.io.MapFile.Reader;
> import org.apache.hadoop.io.Writable;
> import org.apache.hadoop.io.WritableComparable;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
>
> public final class MapFilesMap<K extends WritableComparable, V extends
> Writable>
>   implements Closeable
> {
>   private static final Logger log =
> LoggerFactory.getLogger(MapFilesMap.class);
>
>   private static final PathFilter PARTS_FILTER = new PathFilter()
>   {
>     public boolean accept(Path path) {
>       return path.getName().startsWith("part-");
>     }
>   };
>   private final List<MapFile.Reader> readers;
>
>   public MapFilesMap(FileSystem fs, Path parentDir, Configuration
> conf) throws IOException
>   {
>     log.info <http://log.info>("Creating MapFileMap from parent
> directory {}", parentDir);
>     this.readers = new ArrayList();
>     try {
>       for (FileStatus status : fs.listStatus(parentDir, PARTS_FILTER)) {
>         String path = status.getPath().toString();
>         log.info <http://log.info>("Adding MapFile.Reader at {}", path);
>         this.readers.add(new MapFile.Reader(fs, path, conf));
>       }
>     } catch (IOException ioe) {
>       close();
>       throw ioe;
>     }
>     if (this.readers.isEmpty())
>       throw new IllegalArgumentException("No MapFiles found in " +
> parentDir);
>   }
>
>   public V get(K key, V value)
>     throws IOException
>   {
>     for (MapFile.Reader reader : this.readers)
>     {
>       Writable theValue;
>       if ((theValue = reader.get(key, value)) != null) {
>         return theValue;
>       }
>     }
>     log.debug("No value for key {}", key);
>     return null;
>   }
>
>   public void close()
>   {
>     for (MapFile.Reader reader : this.readers)
>       try {
>         reader.close();
>       }
>       catch (IOException ioe)
>       {
>       }
>   }
> }
>
>
>
> 2010/8/24 Sebastian Schelter <ssc@apache.org <ma...@apache.org>>
>
>     Ok, you guys got me convinced :)
>
>     From a technical point of view two ways to implement that filter
>     come to
>     my mind:
>
>     1) Just load the user/item pairs to filter into memory in the
>     AggregateAndRecommendReducer (easy but might not be scalable) like Han
>     Hui suggested
>     2) Have the AggregateAndRecommendReducer not pick only the top-K
>     recommendations but write all predicted preferences to disk. Add
>     another
>     M/R step after that which joins recommendations and user/item filter
>     pairs to allow for custom rescoring/filtering
>
>     --sebastian
>
>     Am 24.08.2010 06:07, schrieb Ted Dunning:
>     > Sorry to chime in late, but removing items after recommendation
>     isn't such a
>     > crazy thing to do.
>     >
>     > In particular, it is common to remove previously viewed items
>     (for a period
>     > of time).  Likewise, it the user says "don't show this again",
>     it makes
>     > sense to backstop the actual recommendation system with a UI
>     limitation that
>     > does a post-recommendation elimination.
>     >
>     > Moreover, this approach has the great benefit that the results
>     are very
>     > predictable.  Exactly the requested/seen items will be
>     eliminated and no
>     > surprising effect on recommendations will occur.
>     >
>     > That predictability is exactly the problem, though.  Generally
>     you want a
>     > bit more systemic effect for negative recommendations.  This is
>     a really
>     > sticky area, however, because negative recommendations often impart
>     > information about positive preferences in addition to some level
>     of negative
>     > information.
>     >
>     > I used an explicit filter at both Musicmatch and at Veoh.  Both
>     systems
>     > worked well.  Especially at Veoh, there was a lot of additional
>     machinery
>     > required to handle the related problem of anti-flooding.  That
>     was done at
>     > the UI level as well.
>     >
>     > On Mon, Aug 23, 2010 at 8:16 PM, Sean Owen <srowen@gmail.com
>     <ma...@gmail.com>> wrote:
>     >
>     >
>     >> (Uncanny, I was just minutes before researching Grooveshark for
>     >> unrelated reasons... Good to hear from any company doing
>     >> recommendations and is willing to talk about it. I know of a number
>     >> that can't or won't unfortunately.)
>     >>
>     >> Yeah, sounds like we're all on the same page. One key point in
>     what I
>     >> think everyone is talking about is that this is not simply removing
>     >> items *after* recommendations are computed. This risks removing
>     most
>     >> or all recommended items. It needs to be done during the process of
>     >> selecting recommendations.
>     >>
>     >> But beyond that, it's a simple idea and just a question of
>     >> implementation. It's "Rescorer" in the non-Hadoop code, which does
>     >> more than provide a way to remove items but rather generally
>     rearrange
>     >> recommendations according to some logic. I think it's likely
>     easy and
>     >> useful to imitate this with a simple optional Mapper/Reducer
>     phase in
>     >> this nascent "RecommenderJob" pipeline that Sebastian is now
>     helping
>     >> expand into something more configurable and general purpose.
>     >>
>     >> Sean
>     >>
>     >> On Mon, Aug 23, 2010 at 8:25 PM, Chris Bates
>     >> <christopher.andrew.bates@gmail.com
>     <ma...@gmail.com>> wrote:
>     >>
>     >>> Hi all,
>     >>>
>     >>> I'm new to this forum and haven't seen the code you are
>     talking about, so
>     >>> take this with a grain of salt.  The way we handle "banned
>     items" at
>     >>> Grooveshark is to post-process the itemID pairs in Hive.  If a
>     user
>     >>>
>     >> dislikes
>     >>
>     >>> a recommended song/artist, an item pair is stored in HDFS and
>     then when
>     >>>
>     >> the
>     >>
>     >>> recs are computed, those banned user-item pairs are taken into
>     account.
>     >>> Here is an example query:
>     >>>
>     >>> SELECT DISTINCT st.uid, st.simuid, IF(b.uid=st.uid,1,0) as
>     banned  FROM
>     >>> streams_u2u st LEFT OUTER JOIN bannedsimusers b ON
>     (b.simuid=st.simuid);
>     >>>
>     >>> That query will print out a 1 or a 0 if the recommended item
>     pair is
>     >>>
>     >> banned
>     >>
>     >>> or not.  Hive also supports case statements (I think), so you
>     can make a
>     >>> range of "banned-ness" I guess.  Just another solution to the
>     "dislike"
>     >>> problem.
>     >>>
>     >>> Chris
>     >>>
>     >>
>     >
>
>


Re: adding feature:skip user's non-interested items when generate recommendation for user.

Posted by Sebastian Schelter <ss...@apache.org>.
I filed a jira ticket for this issue at
https://issues.apache.org/jira/browse/MAHOUT-493, it's scheduled for 0.5
as I don't have much time to work on mahout these days (have to finish
my diploma thesis)

--sebastian

https://issues.apache.org/jira/browse/MAHOUT-493

Am 24.08.2010 15:45, schrieb han henry:
> For 1) , user's invalid items can store in multiple files, we use use
> MapFilesMap to load the data from HDFS,
> then we can check the invalid items.
>
> package org.apache.mahout.cf.taste.hadoop;
>
> import java.io.Closeable;
> import java.io.IOException;
> import java.util.ArrayList;
> import java.util.List;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.FileStatus;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.PathFilter;
> import org.apache.hadoop.io.MapFile.Reader;
> import org.apache.hadoop.io.Writable;
> import org.apache.hadoop.io.WritableComparable;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
>
> public final class MapFilesMap<K extends WritableComparable, V extends
> Writable>
>   implements Closeable
> {
>   private static final Logger log =
> LoggerFactory.getLogger(MapFilesMap.class);
>
>   private static final PathFilter PARTS_FILTER = new PathFilter()
>   {
>     public boolean accept(Path path) {
>       return path.getName().startsWith("part-");
>     }
>   };
>   private final List<MapFile.Reader> readers;
>
>   public MapFilesMap(FileSystem fs, Path parentDir, Configuration
> conf) throws IOException
>   {
>     log.info <http://log.info>("Creating MapFileMap from parent
> directory {}", parentDir);
>     this.readers = new ArrayList();
>     try {
>       for (FileStatus status : fs.listStatus(parentDir, PARTS_FILTER)) {
>         String path = status.getPath().toString();
>         log.info <http://log.info>("Adding MapFile.Reader at {}", path);
>         this.readers.add(new MapFile.Reader(fs, path, conf));
>       }
>     } catch (IOException ioe) {
>       close();
>       throw ioe;
>     }
>     if (this.readers.isEmpty())
>       throw new IllegalArgumentException("No MapFiles found in " +
> parentDir);
>   }
>
>   public V get(K key, V value)
>     throws IOException
>   {
>     for (MapFile.Reader reader : this.readers)
>     {
>       Writable theValue;
>       if ((theValue = reader.get(key, value)) != null) {
>         return theValue;
>       }
>     }
>     log.debug("No value for key {}", key);
>     return null;
>   }
>
>   public void close()
>   {
>     for (MapFile.Reader reader : this.readers)
>       try {
>         reader.close();
>       }
>       catch (IOException ioe)
>       {
>       }
>   }
> }
>
>
>
> 2010/8/24 Sebastian Schelter <ssc@apache.org <ma...@apache.org>>
>
>     Ok, you guys got me convinced :)
>
>     From a technical point of view two ways to implement that filter
>     come to
>     my mind:
>
>     1) Just load the user/item pairs to filter into memory in the
>     AggregateAndRecommendReducer (easy but might not be scalable) like Han
>     Hui suggested
>     2) Have the AggregateAndRecommendReducer not pick only the top-K
>     recommendations but write all predicted preferences to disk. Add
>     another
>     M/R step after that which joins recommendations and user/item filter
>     pairs to allow for custom rescoring/filtering
>
>     --sebastian
>
>     Am 24.08.2010 06:07, schrieb Ted Dunning:
>     > Sorry to chime in late, but removing items after recommendation
>     isn't such a
>     > crazy thing to do.
>     >
>     > In particular, it is common to remove previously viewed items
>     (for a period
>     > of time).  Likewise, it the user says "don't show this again",
>     it makes
>     > sense to backstop the actual recommendation system with a UI
>     limitation that
>     > does a post-recommendation elimination.
>     >
>     > Moreover, this approach has the great benefit that the results
>     are very
>     > predictable.  Exactly the requested/seen items will be
>     eliminated and no
>     > surprising effect on recommendations will occur.
>     >
>     > That predictability is exactly the problem, though.  Generally
>     you want a
>     > bit more systemic effect for negative recommendations.  This is
>     a really
>     > sticky area, however, because negative recommendations often impart
>     > information about positive preferences in addition to some level
>     of negative
>     > information.
>     >
>     > I used an explicit filter at both Musicmatch and at Veoh.  Both
>     systems
>     > worked well.  Especially at Veoh, there was a lot of additional
>     machinery
>     > required to handle the related problem of anti-flooding.  That
>     was done at
>     > the UI level as well.
>     >
>     > On Mon, Aug 23, 2010 at 8:16 PM, Sean Owen <srowen@gmail.com
>     <ma...@gmail.com>> wrote:
>     >
>     >
>     >> (Uncanny, I was just minutes before researching Grooveshark for
>     >> unrelated reasons... Good to hear from any company doing
>     >> recommendations and is willing to talk about it. I know of a number
>     >> that can't or won't unfortunately.)
>     >>
>     >> Yeah, sounds like we're all on the same page. One key point in
>     what I
>     >> think everyone is talking about is that this is not simply removing
>     >> items *after* recommendations are computed. This risks removing
>     most
>     >> or all recommended items. It needs to be done during the process of
>     >> selecting recommendations.
>     >>
>     >> But beyond that, it's a simple idea and just a question of
>     >> implementation. It's "Rescorer" in the non-Hadoop code, which does
>     >> more than provide a way to remove items but rather generally
>     rearrange
>     >> recommendations according to some logic. I think it's likely
>     easy and
>     >> useful to imitate this with a simple optional Mapper/Reducer
>     phase in
>     >> this nascent "RecommenderJob" pipeline that Sebastian is now
>     helping
>     >> expand into something more configurable and general purpose.
>     >>
>     >> Sean
>     >>
>     >> On Mon, Aug 23, 2010 at 8:25 PM, Chris Bates
>     >> <christopher.andrew.bates@gmail.com
>     <ma...@gmail.com>> wrote:
>     >>
>     >>> Hi all,
>     >>>
>     >>> I'm new to this forum and haven't seen the code you are
>     talking about, so
>     >>> take this with a grain of salt.  The way we handle "banned
>     items" at
>     >>> Grooveshark is to post-process the itemID pairs in Hive.  If a
>     user
>     >>>
>     >> dislikes
>     >>
>     >>> a recommended song/artist, an item pair is stored in HDFS and
>     then when
>     >>>
>     >> the
>     >>
>     >>> recs are computed, those banned user-item pairs are taken into
>     account.
>     >>> Here is an example query:
>     >>>
>     >>> SELECT DISTINCT st.uid, st.simuid, IF(b.uid=st.uid,1,0) as
>     banned  FROM
>     >>> streams_u2u st LEFT OUTER JOIN bannedsimusers b ON
>     (b.simuid=st.simuid);
>     >>>
>     >>> That query will print out a 1 or a 0 if the recommended item
>     pair is
>     >>>
>     >> banned
>     >>
>     >>> or not.  Hive also supports case statements (I think), so you
>     can make a
>     >>> range of "banned-ness" I guess.  Just another solution to the
>     "dislike"
>     >>> problem.
>     >>>
>     >>> Chris
>     >>>
>     >>
>     >
>
>


Re: adding feature:skip user's non-interested items when generate recommendation for user.

Posted by han henry <hu...@gmail.com>.
For 1) , user's invalid items can store in multiple files, we use use
MapFilesMap to load the data from HDFS,
then we can check the invalid items.

package org.apache.mahout.cf.taste.hadoop;

import java.io.Closeable;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathFilter;
import org.apache.hadoop.io.MapFile.Reader;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public final class MapFilesMap<K extends WritableComparable, V extends
Writable>
  implements Closeable
{
  private static final Logger log =
LoggerFactory.getLogger(MapFilesMap.class);

  private static final PathFilter PARTS_FILTER = new PathFilter()
  {
    public boolean accept(Path path) {
      return path.getName().startsWith("part-");
    }
  };
  private final List<MapFile.Reader> readers;

  public MapFilesMap(FileSystem fs, Path parentDir, Configuration conf)
throws IOException
  {
    log.info("Creating MapFileMap from parent directory {}", parentDir);
    this.readers = new ArrayList();
    try {
      for (FileStatus status : fs.listStatus(parentDir, PARTS_FILTER)) {
        String path = status.getPath().toString();
        log.info("Adding MapFile.Reader at {}", path);
        this.readers.add(new MapFile.Reader(fs, path, conf));
      }
    } catch (IOException ioe) {
      close();
      throw ioe;
    }
    if (this.readers.isEmpty())
      throw new IllegalArgumentException("No MapFiles found in " +
parentDir);
  }

  public V get(K key, V value)
    throws IOException
  {
    for (MapFile.Reader reader : this.readers)
    {
      Writable theValue;
      if ((theValue = reader.get(key, value)) != null) {
        return theValue;
      }
    }
    log.debug("No value for key {}", key);
    return null;
  }

  public void close()
  {
    for (MapFile.Reader reader : this.readers)
      try {
        reader.close();
      }
      catch (IOException ioe)
      {
      }
  }
}



2010/8/24 Sebastian Schelter <ss...@apache.org>

> Ok, you guys got me convinced :)
>
> From a technical point of view two ways to implement that filter come to
> my mind:
>
> 1) Just load the user/item pairs to filter into memory in the
> AggregateAndRecommendReducer (easy but might not be scalable) like Han
> Hui suggested
> 2) Have the AggregateAndRecommendReducer not pick only the top-K
> recommendations but write all predicted preferences to disk. Add another
> M/R step after that which joins recommendations and user/item filter
> pairs to allow for custom rescoring/filtering
>
> --sebastian
>
> Am 24.08.2010 06:07, schrieb Ted Dunning:
> > Sorry to chime in late, but removing items after recommendation isn't
> such a
> > crazy thing to do.
> >
> > In particular, it is common to remove previously viewed items (for a
> period
> > of time).  Likewise, it the user says "don't show this again", it makes
> > sense to backstop the actual recommendation system with a UI limitation
> that
> > does a post-recommendation elimination.
> >
> > Moreover, this approach has the great benefit that the results are very
> > predictable.  Exactly the requested/seen items will be eliminated and no
> > surprising effect on recommendations will occur.
> >
> > That predictability is exactly the problem, though.  Generally you want a
> > bit more systemic effect for negative recommendations.  This is a really
> > sticky area, however, because negative recommendations often impart
> > information about positive preferences in addition to some level of
> negative
> > information.
> >
> > I used an explicit filter at both Musicmatch and at Veoh.  Both systems
> > worked well.  Especially at Veoh, there was a lot of additional machinery
> > required to handle the related problem of anti-flooding.  That was done
> at
> > the UI level as well.
> >
> > On Mon, Aug 23, 2010 at 8:16 PM, Sean Owen <sr...@gmail.com> wrote:
> >
> >
> >> (Uncanny, I was just minutes before researching Grooveshark for
> >> unrelated reasons... Good to hear from any company doing
> >> recommendations and is willing to talk about it. I know of a number
> >> that can't or won't unfortunately.)
> >>
> >> Yeah, sounds like we're all on the same page. One key point in what I
> >> think everyone is talking about is that this is not simply removing
> >> items *after* recommendations are computed. This risks removing most
> >> or all recommended items. It needs to be done during the process of
> >> selecting recommendations.
> >>
> >> But beyond that, it's a simple idea and just a question of
> >> implementation. It's "Rescorer" in the non-Hadoop code, which does
> >> more than provide a way to remove items but rather generally rearrange
> >> recommendations according to some logic. I think it's likely easy and
> >> useful to imitate this with a simple optional Mapper/Reducer phase in
> >> this nascent "RecommenderJob" pipeline that Sebastian is now helping
> >> expand into something more configurable and general purpose.
> >>
> >> Sean
> >>
> >> On Mon, Aug 23, 2010 at 8:25 PM, Chris Bates
> >> <ch...@gmail.com> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I'm new to this forum and haven't seen the code you are talking about,
> so
> >>> take this with a grain of salt.  The way we handle "banned items" at
> >>> Grooveshark is to post-process the itemID pairs in Hive.  If a user
> >>>
> >> dislikes
> >>
> >>> a recommended song/artist, an item pair is stored in HDFS and then when
> >>>
> >> the
> >>
> >>> recs are computed, those banned user-item pairs are taken into account.
> >>> Here is an example query:
> >>>
> >>> SELECT DISTINCT st.uid, st.simuid, IF(b.uid=st.uid,1,0) as banned  FROM
> >>> streams_u2u st LEFT OUTER JOIN bannedsimusers b ON
> (b.simuid=st.simuid);
> >>>
> >>> That query will print out a 1 or a 0 if the recommended item pair is
> >>>
> >> banned
> >>
> >>> or not.  Hive also supports case statements (I think), so you can make
> a
> >>> range of "banned-ness" I guess.  Just another solution to the "dislike"
> >>> problem.
> >>>
> >>> Chris
> >>>
> >>
> >
>
>

Re: adding feature:skip user's non-interested items when generate recommendation for user.

Posted by Sebastian Schelter <ss...@apache.org>.
Ok, you guys got me convinced :)

>From a technical point of view two ways to implement that filter come to
my mind:

1) Just load the user/item pairs to filter into memory in the
AggregateAndRecommendReducer (easy but might not be scalable) like Han
Hui suggested
2) Have the AggregateAndRecommendReducer not pick only the top-K
recommendations but write all predicted preferences to disk. Add another
M/R step after that which joins recommendations and user/item filter
pairs to allow for custom rescoring/filtering

--sebastian

Am 24.08.2010 06:07, schrieb Ted Dunning:
> Sorry to chime in late, but removing items after recommendation isn't such a
> crazy thing to do.
>
> In particular, it is common to remove previously viewed items (for a period
> of time).  Likewise, it the user says "don't show this again", it makes
> sense to backstop the actual recommendation system with a UI limitation that
> does a post-recommendation elimination.
>
> Moreover, this approach has the great benefit that the results are very
> predictable.  Exactly the requested/seen items will be eliminated and no
> surprising effect on recommendations will occur.
>
> That predictability is exactly the problem, though.  Generally you want a
> bit more systemic effect for negative recommendations.  This is a really
> sticky area, however, because negative recommendations often impart
> information about positive preferences in addition to some level of negative
> information.
>
> I used an explicit filter at both Musicmatch and at Veoh.  Both systems
> worked well.  Especially at Veoh, there was a lot of additional machinery
> required to handle the related problem of anti-flooding.  That was done at
> the UI level as well.
>
> On Mon, Aug 23, 2010 at 8:16 PM, Sean Owen <sr...@gmail.com> wrote:
>
>   
>> (Uncanny, I was just minutes before researching Grooveshark for
>> unrelated reasons... Good to hear from any company doing
>> recommendations and is willing to talk about it. I know of a number
>> that can't or won't unfortunately.)
>>
>> Yeah, sounds like we're all on the same page. One key point in what I
>> think everyone is talking about is that this is not simply removing
>> items *after* recommendations are computed. This risks removing most
>> or all recommended items. It needs to be done during the process of
>> selecting recommendations.
>>
>> But beyond that, it's a simple idea and just a question of
>> implementation. It's "Rescorer" in the non-Hadoop code, which does
>> more than provide a way to remove items but rather generally rearrange
>> recommendations according to some logic. I think it's likely easy and
>> useful to imitate this with a simple optional Mapper/Reducer phase in
>> this nascent "RecommenderJob" pipeline that Sebastian is now helping
>> expand into something more configurable and general purpose.
>>
>> Sean
>>
>> On Mon, Aug 23, 2010 at 8:25 PM, Chris Bates
>> <ch...@gmail.com> wrote:
>>     
>>> Hi all,
>>>
>>> I'm new to this forum and haven't seen the code you are talking about, so
>>> take this with a grain of salt.  The way we handle "banned items" at
>>> Grooveshark is to post-process the itemID pairs in Hive.  If a user
>>>       
>> dislikes
>>     
>>> a recommended song/artist, an item pair is stored in HDFS and then when
>>>       
>> the
>>     
>>> recs are computed, those banned user-item pairs are taken into account.
>>> Here is an example query:
>>>
>>> SELECT DISTINCT st.uid, st.simuid, IF(b.uid=st.uid,1,0) as banned  FROM
>>> streams_u2u st LEFT OUTER JOIN bannedsimusers b ON (b.simuid=st.simuid);
>>>
>>> That query will print out a 1 or a 0 if the recommended item pair is
>>>       
>> banned
>>     
>>> or not.  Hive also supports case statements (I think), so you can make a
>>> range of "banned-ness" I guess.  Just another solution to the "dislike"
>>> problem.
>>>
>>> Chris
>>>       
>>     
>   


Re: adding feature:skip user's non-interested items when generate recommendation for user.

Posted by Ted Dunning <te...@gmail.com>.
Sorry to chime in late, but removing items after recommendation isn't such a
crazy thing to do.

In particular, it is common to remove previously viewed items (for a period
of time).  Likewise, it the user says "don't show this again", it makes
sense to backstop the actual recommendation system with a UI limitation that
does a post-recommendation elimination.

Moreover, this approach has the great benefit that the results are very
predictable.  Exactly the requested/seen items will be eliminated and no
surprising effect on recommendations will occur.

That predictability is exactly the problem, though.  Generally you want a
bit more systemic effect for negative recommendations.  This is a really
sticky area, however, because negative recommendations often impart
information about positive preferences in addition to some level of negative
information.

I used an explicit filter at both Musicmatch and at Veoh.  Both systems
worked well.  Especially at Veoh, there was a lot of additional machinery
required to handle the related problem of anti-flooding.  That was done at
the UI level as well.

On Mon, Aug 23, 2010 at 8:16 PM, Sean Owen <sr...@gmail.com> wrote:

> (Uncanny, I was just minutes before researching Grooveshark for
> unrelated reasons... Good to hear from any company doing
> recommendations and is willing to talk about it. I know of a number
> that can't or won't unfortunately.)
>
> Yeah, sounds like we're all on the same page. One key point in what I
> think everyone is talking about is that this is not simply removing
> items *after* recommendations are computed. This risks removing most
> or all recommended items. It needs to be done during the process of
> selecting recommendations.
>
> But beyond that, it's a simple idea and just a question of
> implementation. It's "Rescorer" in the non-Hadoop code, which does
> more than provide a way to remove items but rather generally rearrange
> recommendations according to some logic. I think it's likely easy and
> useful to imitate this with a simple optional Mapper/Reducer phase in
> this nascent "RecommenderJob" pipeline that Sebastian is now helping
> expand into something more configurable and general purpose.
>
> Sean
>
> On Mon, Aug 23, 2010 at 8:25 PM, Chris Bates
> <ch...@gmail.com> wrote:
> > Hi all,
> >
> > I'm new to this forum and haven't seen the code you are talking about, so
> > take this with a grain of salt.  The way we handle "banned items" at
> > Grooveshark is to post-process the itemID pairs in Hive.  If a user
> dislikes
> > a recommended song/artist, an item pair is stored in HDFS and then when
> the
> > recs are computed, those banned user-item pairs are taken into account.
> > Here is an example query:
> >
> > SELECT DISTINCT st.uid, st.simuid, IF(b.uid=st.uid,1,0) as banned  FROM
> > streams_u2u st LEFT OUTER JOIN bannedsimusers b ON (b.simuid=st.simuid);
> >
> > That query will print out a 1 or a 0 if the recommended item pair is
> banned
> > or not.  Hive also supports case statements (I think), so you can make a
> > range of "banned-ness" I guess.  Just another solution to the "dislike"
> > problem.
> >
> > Chris
>

Re: adding feature:skip user's non-interested items when generate recommendation for user.

Posted by Sean Owen <sr...@gmail.com>.
(Uncanny, I was just minutes before researching Grooveshark for
unrelated reasons... Good to hear from any company doing
recommendations and is willing to talk about it. I know of a number
that can't or won't unfortunately.)

Yeah, sounds like we're all on the same page. One key point in what I
think everyone is talking about is that this is not simply removing
items *after* recommendations are computed. This risks removing most
or all recommended items. It needs to be done during the process of
selecting recommendations.

But beyond that, it's a simple idea and just a question of
implementation. It's "Rescorer" in the non-Hadoop code, which does
more than provide a way to remove items but rather generally rearrange
recommendations according to some logic. I think it's likely easy and
useful to imitate this with a simple optional Mapper/Reducer phase in
this nascent "RecommenderJob" pipeline that Sebastian is now helping
expand into something more configurable and general purpose.

Sean

On Mon, Aug 23, 2010 at 8:25 PM, Chris Bates
<ch...@gmail.com> wrote:
> Hi all,
>
> I'm new to this forum and haven't seen the code you are talking about, so
> take this with a grain of salt.  The way we handle "banned items" at
> Grooveshark is to post-process the itemID pairs in Hive.  If a user dislikes
> a recommended song/artist, an item pair is stored in HDFS and then when the
> recs are computed, those banned user-item pairs are taken into account.
> Here is an example query:
>
> SELECT DISTINCT st.uid, st.simuid, IF(b.uid=st.uid,1,0) as banned  FROM
> streams_u2u st LEFT OUTER JOIN bannedsimusers b ON (b.simuid=st.simuid);
>
> That query will print out a 1 or a 0 if the recommended item pair is banned
> or not.  Hive also supports case statements (I think), so you can make a
> range of "banned-ness" I guess.  Just another solution to the "dislike"
> problem.
>
> Chris

Re: adding feature:skip user's non-interested items when generate recommendation for user.

Posted by Chris Bates <ch...@gmail.com>.
Hi all,

I'm new to this forum and haven't seen the code you are talking about, so
take this with a grain of salt.  The way we handle "banned items" at
Grooveshark is to post-process the itemID pairs in Hive.  If a user dislikes
a recommended song/artist, an item pair is stored in HDFS and then when the
recs are computed, those banned user-item pairs are taken into account.
Here is an example query:

SELECT DISTINCT st.uid, st.simuid, IF(b.uid=st.uid,1,0) as banned  FROM
streams_u2u st LEFT OUTER JOIN bannedsimusers b ON (b.simuid=st.simuid);

That query will print out a 1 or a 0 if the recommended item pair is banned
or not.  Hive also supports case statements (I think), so you can make a
range of "banned-ness" I guess.  Just another solution to the "dislike"
problem.

Chris

On Mon, Aug 23, 2010 at 12:03 PM, Sean Owen <sr...@gmail.com> wrote:

> Sebastian is right that in this case, you might well model these as
> preferences with low value. It's reasonable, but, I also agree that
> somehow an 'ignored' recommendation does not necessarily mean the same
> as a low preference. There are some situations where you might want to
> exclude items from recommendation for many reasons (e.g. it's
> currently out of stock).
>
> This is what the Rescorer does in the non-distributed code. There is
> not yet any counterpart in the distributed code.
>
> It would be fairly simple. You just need a job to modify the estimated
> preference vector after recommendation, in whatever way you want. Here
> you just clear the entries in the vector for anything you don't want
> recommended. Any other transformation is possible.
>
> This is how such a function like this ought to look in Mahout, I think
> -- some kind of RescorerMapper / RescorerReducer. If you can make a
> patch along those lines, I'm sure we could integrate it.
>
> Sean
>
> On Mon, Aug 23, 2010 at 3:28 PM, han henry <hu...@gmail.com> wrote:
> > Hi,Sebastian
> >
> > Actually we has three types feedback types:
> >
> > 1) user more likes the recommendation, we give a higher score and add a
> new
> > preference,this preference is used to calculate similarity.
> >
> > 2) user less likes the recommendation, we give a lower score and add a
> new
> > preference,this preference is used to calculate similarity.
> >
> > 3) user just want to remove this item,because we always get certain
> number
> > recommendation,user want to get more new recommendation,he just need
> remove
> > the item form list (he can bookmark this items for future viewing).
> >
> >    So we add a new preference,this preference  is not used to calculate
> > similarity,but we can not recommend the preference again.
> >
> > My proposal is for the third feedback type. for first and second feedback
> > type ,it just need add one preference.
> >
> > ---Henry Han
> >
> >
> > 2010/8/23 Sebastian Schelter <ss...@apache.org>
> >
> >> Hi,
> >>
> >> I think it is an interesting feature. But maybe it is not the best way
> >> to exclude the item at the end of the recommendation process.
> >>
> >> Another way could be to just add a preference with a negative rating to
> >> the input data whenever a user rejects an item. That way this would
> >> provide more information about the user and the item would automatically
> >> be excluded from the output of the RecommenderJob as the user has
> >> already seen this item.
> >>
> >> The question is whether it's conceptually okay to add a negative
> >> preference whenever the user rejects an item. Any thoughts on this?
> >>
> >> --sebastian
> >>
> >> Am 23.08.2010 15:57, schrieb han henry:
> >> > Hi,All
> >> >
> >> > Sometimes user's dislikes some recommendation we generated ,he/she
> >> > does not want to see the recommended items again.
> >> >
> >> > Here is a example from Amazon.com (see the attachment ).
> >> >
> >> > I have written one patch for it.the logic as following :
> >> >
> >> > 1) Dump user's non-interested items to HDFS, format like
> >> > userId+"_"+item_id. before we run RecommenderJob
> >> > 2) Load user's invalid data to HashMap when
> >> > AggregateAndRecommendReducer setup
> >> > 3) Skip user's non-interested items when choose TOP N recommendations
> >> > for user.
> >> >
> >> > Does it make sense and can merge to the repository ?
> >> >
> >> >
> >>
> >>
> >
>

Re: adding feature:skip user's non-interested items when generate recommendation for user.

Posted by Sean Owen <sr...@gmail.com>.
Sebastian is right that in this case, you might well model these as
preferences with low value. It's reasonable, but, I also agree that
somehow an 'ignored' recommendation does not necessarily mean the same
as a low preference. There are some situations where you might want to
exclude items from recommendation for many reasons (e.g. it's
currently out of stock).

This is what the Rescorer does in the non-distributed code. There is
not yet any counterpart in the distributed code.

It would be fairly simple. You just need a job to modify the estimated
preference vector after recommendation, in whatever way you want. Here
you just clear the entries in the vector for anything you don't want
recommended. Any other transformation is possible.

This is how such a function like this ought to look in Mahout, I think
-- some kind of RescorerMapper / RescorerReducer. If you can make a
patch along those lines, I'm sure we could integrate it.

Sean

On Mon, Aug 23, 2010 at 3:28 PM, han henry <hu...@gmail.com> wrote:
> Hi,Sebastian
>
> Actually we has three types feedback types:
>
> 1) user more likes the recommendation, we give a higher score and add a new
> preference,this preference is used to calculate similarity.
>
> 2) user less likes the recommendation, we give a lower score and add a new
> preference,this preference is used to calculate similarity.
>
> 3) user just want to remove this item,because we always get certain number
> recommendation,user want to get more new recommendation,he just need remove
> the item form list (he can bookmark this items for future viewing).
>
>    So we add a new preference,this preference  is not used to calculate
> similarity,but we can not recommend the preference again.
>
> My proposal is for the third feedback type. for first and second feedback
> type ,it just need add one preference.
>
> ---Henry Han
>
>
> 2010/8/23 Sebastian Schelter <ss...@apache.org>
>
>> Hi,
>>
>> I think it is an interesting feature. But maybe it is not the best way
>> to exclude the item at the end of the recommendation process.
>>
>> Another way could be to just add a preference with a negative rating to
>> the input data whenever a user rejects an item. That way this would
>> provide more information about the user and the item would automatically
>> be excluded from the output of the RecommenderJob as the user has
>> already seen this item.
>>
>> The question is whether it's conceptually okay to add a negative
>> preference whenever the user rejects an item. Any thoughts on this?
>>
>> --sebastian
>>
>> Am 23.08.2010 15:57, schrieb han henry:
>> > Hi,All
>> >
>> > Sometimes user's dislikes some recommendation we generated ,he/she
>> > does not want to see the recommended items again.
>> >
>> > Here is a example from Amazon.com (see the attachment ).
>> >
>> > I have written one patch for it.the logic as following :
>> >
>> > 1) Dump user's non-interested items to HDFS, format like
>> > userId+"_"+item_id. before we run RecommenderJob
>> > 2) Load user's invalid data to HashMap when
>> > AggregateAndRecommendReducer setup
>> > 3) Skip user's non-interested items when choose TOP N recommendations
>> > for user.
>> >
>> > Does it make sense and can merge to the repository ?
>> >
>> >
>>
>>
>

Re: adding feature:skip user's non-interested items when generate recommendation for user.

Posted by han henry <hu...@gmail.com>.
Hi,Sebastian

Actually we has three types feedback types:

1) user more likes the recommendation, we give a higher score and add a new
preference,this preference is used to calculate similarity.

2) user less likes the recommendation, we give a lower score and add a new
preference,this preference is used to calculate similarity.

3) user just want to remove this item,because we always get certain number
recommendation,user want to get more new recommendation,he just need remove
the item form list (he can bookmark this items for future viewing).

    So we add a new preference,this preference  is not used to calculate
similarity,but we can not recommend the preference again.

My proposal is for the third feedback type. for first and second feedback
type ,it just need add one preference.

---Henry Han


2010/8/23 Sebastian Schelter <ss...@apache.org>

> Hi,
>
> I think it is an interesting feature. But maybe it is not the best way
> to exclude the item at the end of the recommendation process.
>
> Another way could be to just add a preference with a negative rating to
> the input data whenever a user rejects an item. That way this would
> provide more information about the user and the item would automatically
> be excluded from the output of the RecommenderJob as the user has
> already seen this item.
>
> The question is whether it's conceptually okay to add a negative
> preference whenever the user rejects an item. Any thoughts on this?
>
> --sebastian
>
> Am 23.08.2010 15:57, schrieb han henry:
> > Hi,All
> >
> > Sometimes user's dislikes some recommendation we generated ,he/she
> > does not want to see the recommended items again.
> >
> > Here is a example from Amazon.com (see the attachment ).
> >
> > I have written one patch for it.the logic as following :
> >
> > 1) Dump user's non-interested items to HDFS, format like
> > userId+"_"+item_id. before we run RecommenderJob
> > 2) Load user's invalid data to HashMap when
> > AggregateAndRecommendReducer setup
> > 3) Skip user's non-interested items when choose TOP N recommendations
> > for user.
> >
> > Does it make sense and can merge to the repository ?
> >
> >
>
>

Re: adding feature:skip user's non-interested items when generate recommendation for user.

Posted by Sebastian Schelter <ss...@apache.org>.
Hi,

I think it is an interesting feature. But maybe it is not the best way
to exclude the item at the end of the recommendation process.

Another way could be to just add a preference with a negative rating to
the input data whenever a user rejects an item. That way this would
provide more information about the user and the item would automatically
be excluded from the output of the RecommenderJob as the user has
already seen this item.

The question is whether it's conceptually okay to add a negative
preference whenever the user rejects an item. Any thoughts on this?

--sebastian

Am 23.08.2010 15:57, schrieb han henry:
> Hi,All
>
> Sometimes user's dislikes some recommendation we generated ,he/she
> does not want to see the recommended items again.
>
> Here is a example from Amazon.com (see the attachment ).
>
> I have written one patch for it.the logic as following :
>
> 1) Dump user's non-interested items to HDFS, format like
> userId+"_"+item_id. before we run RecommenderJob
> 2) Load user's invalid data to HashMap when
> AggregateAndRecommendReducer setup
> 3) Skip user's non-interested items when choose TOP N recommendations
> for user.
>
> Does it make sense and can merge to the repository ?
>
>