You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MLnick <gi...@git.apache.org> on 2017/07/27 09:00:34 UTC

[GitHub] spark issue #18748: [SPARK-20679][ML] Support recommending for a subset of u...

Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/18748
  
    **Note 1** this implementation must perform a `distinct` on the input data frame id column to guarantee correct results, since otherwise multiple "copies" of the same recommendations would be generated for duplicate ids, and the resulting recommendations contain duplicates. This could alternatively be left to the user to handle, and assume that the input data frame contains no duplicates. But for now I've opted for the safest option even if it introduces this inefficiency.
    
    **Note 2** This does not support `coldStartStrategy`. Therefore no recommendations will be returned for ids in the input dataframe that are not contained in the model (this is analogous to `coldStartStrategy=drop` for `transform`). I believe this makes most sense, since supporting something like the `na` option would be a bit involved and not add that much value. However it could be done (but would need to return `null` rows in the `recommendation` column for these cases). Later, when other cold start strategies might be supported (e.g. average factor vectors), this method could return recommendations even for ids that are not contained in the model.
    
    cc @srowen @jkbradley @yanboliang @mpjlu @sethah 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org