You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/05/08 16:10:04 UTC

[GitHub] [spark] JoshRosen commented on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function

JoshRosen commented on issue #24557: [SPARK-27653][SQL] Add max_by() SQL aggregate function
URL: https://github.com/apache/spark/pull/24557#issuecomment-490549494
 
 
   Hi @viirya,
   
   Thanks for working on this! 
   
   I had a few quick questions:
   
   - Could you also implement `min_by(x, y)`?
     - It looks like you might be able to share most of the code except for replacing `GreaterThan` and `greatest`, so maybe this difference could be abstracted away via a shared abstract superclass.
   - Presto also has three-argument versions of `max_by` / `min_by`:
   
     > max_by(x, y, n) → array<[same as x]>
     > Returns n values of x associated with the n largest of all input values of y in descending order of y.
   
     I don't think we need to do this version now, especially since we can always add it in a separate followup PR (which is what Presto originally did: https://github.com/prestodb/presto/issues/3620)
   - Were there any bugs in older implementations of Presto version that we might have replicated here? Or Presto tests for edge-cases that we could emulate?
     - https://github.com/prestodb/presto/issues/7646 discusses using rows / structs as the ordering value. I suspect that would work already in your implementation, but it might be nice to check.
     - https://github.com/prestodb/presto/issues/2040 discusses a problem with null ordering values. It looks like you're testing that case, but I just wanted to double-check that we've covered the same edge-case there.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org