You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nick Pentreath (JIRA)" <ji...@apache.org> on 2017/03/06 09:07:32 UTC
[jira] [Comment Edited] (SPARK-14409) Investigate adding a RankingEvaluator to ML

    [ https://issues.apache.org/jira/browse/SPARK-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896933#comment-15896933 ] 

Nick Pentreath edited comment on SPARK-14409 at 3/6/17 9:06 AM:
----------------------------------------------------------------

I've thought about this a lot over the past few days, and I think the approach should be in line with that suggested by [~roberto.mirizzi] & [~danilo.ascione].

*Goal*

Provide a DataFrame-based ranking evaluator that is general enough to handle common scenarios such as recommendations (ALS), search ranking, ad click prediction using ranking metrics (e.g. recent Kaggle competitions for illustration: [Outbrain Ad Clicks using MAP|https://www.kaggle.com/c/outbrain-click-prediction#evaluation], [Expedia Hotel Search Ranking using NDCG|https://www.kaggle.com/c/expedia-personalized-sort#evaluation]).

*RankingEvaluator input format*

{{evaluate}} would take a {{DataFrame}} with columns:

* {{queryCol}} - the column containing "query id" (e.g. "query" for cases such as search ranking; "user" for recommendations; "impression" for ad click prediction/ranking, etc);
* {{documentCol}} - the column containing "document id" (e.g. "document" in search, "item" in recommendation, "ad" in ad ranking, etc);
* {{labelCol}} (or maybe {{relevanceCol}} to be more precise) - the column containing the true relevance score for a query-document pair (e.g. in recommendations this would be the "rating"). This column will only be used for filtering out "irrelevant" documents from the ground-truth set (see Param {{goodThreshold}} mentioned [above|https://issues.apache.org/jira/browse/SPARK-14409?focusedCommentId=15826901&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15826901)]);
* {{predictionCol}} - the column containing the predicted relevance score for a query-document pair. The predicted ids will be ordered by this column for computing ranking metrics (for which order matters in predictions but generally not for ground-truth which is treated as a set).

The reasoning is that this format is flexible & generic enough to encompass the diverse use cases mentioned above.

Here is an illustrative example from recommendations as a special case:

{code}
+------+-------+------+----------+
|userId|movieId|rating|prediction|
+------+-------+------+----------+
|   230|    318|   5.0| 4.2403245|
|   230|   3424|   4.0|      null|
|   230|  81191|  null|  4.317455|
+------+-------+------+----------+
{code}

You will notice that {{rating}} and {{prediction}} columns can be {{null}}. This is by design. There are three cases shown above:

# 1st row indicates a query-document (user-item) pair that occurs in *both* the ground-truth set and the top-k predictions;
# 2nd row indicates a user-item pair that occurs in the ground-truth set, but *not* in the top-k predictions;
# 3rd row indicates a user-item pair that *does not* occur in the ground-truth set, but *does* occur in the top-k predictions;

*Note* that while technically the input allows both these columns to be {{null}} in practice that won't occur since a query-document pair must occur in at least one of the ground-truth set or predictions. If it does occur for some reason it can be ignored.

*Evaluator approach*

The evaluator will perform a window function over {{queryCol}} and order by {{predictionCol}} within each query. Then, {{collect_list}} can be used to arrive at the following intermediate format:

{code}
+------+--------------------+--------------------+
|userId|         true_labels|    predicted_labels|
+------+--------------------+--------------------+
|   230|[318, 3424, 7139,...|[81191, 93040, 31...|
+------+--------------------+--------------------+
{code}

*Relationship to RankingMetrics*

Technically the intermediate format above is the same format as used for {{RankingMetrics}}, and perhaps we could simple wrap the {{mllib}} version. *Note* however that the {{mllib}} class is parameterized by the type of "document": {code}RankingMetrics[T]{code}

I believe for the generic case we must support both {{NumericType}} and {{StringType}} for id columns (rather than restricting to {{Int}} as in Danilo & Roberto versions above). So either:
# the evaluator must be similarly parameterized; or
# we will need to re-write the ranking metrics computations as UDFs as follows: {code} udf { (predicted: Seq[Any], actual: Seq[Any]) => ... {code} 

I strongly prefer option #2 as it is more flexible and in keeping with the DataFrame style of Spark ML components (as a side note, this will give us a chance to review the implementations & naming of metrics, since there are some issues with a few of the metrics).


That is my proposal (sorry Yong, this is quite different now from the work you've done in your PR). If Yong or Danilo has time to update his PR in this direction, let me know.

cc [~josephkb] FYI

Thanks!


was (Author: mlnick):
I've thought about this a lot over the past few days, and I think the approach should be in line with that suggested by [~roberto.mirizzi] & [~danilo.ascione].

*Goal*

Provide a DataFrame-based ranking evaluator that is general enough to handle common scenarios such as recommendations (ALS), search ranking, ad click prediction using ranking metrics (e.g. recent Kaggle competitions for illustration: [Outbrain Ad Clicks using MAP|https://www.kaggle.com/c/outbrain-click-prediction#evaluation], [Expedia Hotel Search Ranking using NDCG|https://www.kaggle.com/c/expedia-personalized-sort#evaluation]).

*RankingEvaluator input format*

{{evaluate}} would take a {{DataFrame}} with columns:

* {{queryCol}} - the column containing "query id" (e.g. "query" for cases such as search ranking; "user" for recommendations; "impression" for ad click prediction/ranking, etc);
* {{documentCol}} - the column containing "document id" (e.g. "document" in search, "item" in recommendation, "ad" in ad ranking, etc);
* {{labelCol}} (or maybe {{relevanceCol}} to be more precise) - the column containing the true relevance score for a query-document pair (e.g. in recommendations this would be the "rating"). This column will only be used for filtering out "irrelevant" documents from the ground-truth set (see Param {{goodThreshold}} mentioned [above|https://issues.apache.org/jira/browse/SPARK-14409?focusedCommentId=15826901&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15826901)]);
* {{predictionCol}} - the column containing the predicted relevance score for a query-document pair. The predicted ids will be ordered by this column for computing ranking metrics (for which order matters in predictions but generally not for ground-truth which is treated as a set).

The reasoning is that this format is flexible & generic enough to encompass the diverse use cases mentioned above.

Here is an illustrative example from recommendations as a special case:

{code}
+------+-------+------+----------+
|userId|movieId|rating|prediction|
+------+-------+------+----------+
|   230|    318|   5.0| 4.2403245|
|   230|   3424|   4.0|      null|
|   230|  81191|  null|  4.317455|
+------+-------+------+----------+
{code}

You will notice that {{rating}} and {{prediction}} columns can be {{null}}. This is by design. There are three cases shown above:

# 1st row indicates a query-document (user-item) pair that occurs in *both* the ground-truth set and the top-k predictions;
# 2nd row indicates a user-item pair that occurs in the ground-truth set, but *not* in the top-k predictions;
# 3rd row indicates a user-item pair that *does not* occur in the ground-truth set, but *does* occur in the top-k predictions;

*Note* that while technically the input allows both these columns to be {{null}} in practice that won't occur since a query-document pair must occur in at least one of the ground-truth set or predictions. If it does occur for some reason it can be ignored.

*Evaluator approach*

The evaluator will perform a window function over {{queryCol}} and order by {{predictionCol}} within each query. Then, {{collect_list}} can be used to arrive at the following intermediate format:

{code}
+------+--------------------+--------------------+
|userId|         true_labels|    predicted_labels|
+------+--------------------+--------------------+
|   230|[318, 3424, 7139,...|[81191, 93040, 31...|
+------+--------------------+--------------------+
{code}

*Relationship to RankingMetrics*

Technically the intermediate format above is the same format as used for {{RankingMetrics}}, and perhaps we could simple wrap the {{mllib}} version. *Note* however that the {{mllib}} class is parameterized by the type of "document": {code}RankingMetrics[T]{code}

I believe for the generic case we must support both {{NumericType}} and {{StringType}} for id columns (rather than restricting to {{Int}} as in Danilo & Roberto versions above). So either:
# the evaluator must be similarly parameterized; or
# we will need to re-write the ranking metrics computations as UDFs as follows: {code} udf { (predicted: Seq[Any], actual: Seq[Any]) => ... {code} 

I strongly prefer option #2 as it is more flexible and in keeping with the DataFrame style of Spark ML components (as a side note, this will give us a chance to review the implementations & naming of metrics, since there are some issues with a few of the metrics).


That is my proposal (sorry Yong, this is quite different now from the work you've done in your PR). If Yong or Danilo has time to update his PR in this direction, let me know.

Thanks!

> Investigate adding a RankingEvaluator to ML
> -------------------------------------------
>
>                 Key: SPARK-14409
>                 URL: https://issues.apache.org/jira/browse/SPARK-14409
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Nick Pentreath
>            Priority: Minor
>
> {{mllib.evaluation}} contains a {{RankingMetrics}} class, while there is no {{RankingEvaluator}} in {{ml.evaluation}}. Such an evaluator can be useful for recommendation evaluation (and can be useful in other settings potentially).
> Should be thought about in conjunction with adding the "recommendAll" methods in SPARK-13857, so that top-k ranking metrics can be used in cross-validators.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org