You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/09 07:54:33 UTC

[GitHub] [spark] ahmed-mahran opened a new pull request, #38996: [SPARK-41008][MLLIB] Follow-up isotonic regression features deduplica…

ahmed-mahran opened a new pull request, #38996:
URL: https://github.com/apache/spark/pull/38996

### What changes were proposed in this pull request?

A follow-up on https://github.com/apache/spark/pull/38966 to update relevant documentation and remove redundant sort key.

### Why are the changes needed?

For isotonic regression, another method for breaking ties of repeated features was introduced in https://github.com/apache/spark/pull/38966. This will aggregate points having the same feature value by computing the weighted average of the labels.
- This only requires points to be sorted by features instead of features and labels. So, we should remove label as a secondary sorting key.
- Isotonic regression documentation needs to be updated to reflect the new behavior.

### Does this PR introduce _any_ user-facing change?

Isotonic regression documentation update. The documentation described the behavior of the algorithm when there are points in the input with repeated features. Since this behavior has changed, documentation needs to describe the new behavior.

### How was this patch tested?

Existing tests passed. No need to add new tests since existing tests are already comprehensive.

@srowen

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org