You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "jasperjiaguo (via GitHub)" <gi...@apache.org> on 2023/05/08 02:15:33 UTC

[GitHub] [pinot] jasperjiaguo commented on pull request #10636: Support for ARG_MIN and ARG_MAX Functions

jasperjiaguo commented on PR #10636:
URL: https://github.com/apache/pinot/pull/10636#issuecomment-1537644964

   > Thinking more on my previous comment.....
   > 
   > May be one way to workaround the NULL business is to output array when we have duplicates where the min and max is happening ?
   > 
   > This query
   > 
   > ```
   > SELECT 
   > argmin(intCol, **stringCol**),  
   > argmin(intCol, **doubleCol**), 
   > sum(doubleCol)  
   > FROM table
   > ```
   > 
   > can output
   > 
   > argmin(intCol, stringCol)	argmin(intCol, doubleCol)	sum(doubleCol)
   > ["a2", "a11"]	[2.0, 3.0]	9.0
   > Similarly, the following query
   > 
   > ```
   > SELECT 
   > intCol, 
   > argmin(longCol, **doubleCol**),  
   > argmin(longCol, **longCol**)   
   > FROM table 
   > GROUP BY intCol
   > ```
   > 
   > Can output
   > 
   > intCol	argmin(longCol, doubleCol)	argmin(longCol, longCol)
   > 1	[2.0, 3.0]	2
   > 2	4.0	1
   > This is probably a more intuitive way to reason about response and is more SQL friendly imo and avoids populating NULLs.
   > 
   > @jasperjiaguo wdyt ?
   
   Agreed that null filling can be confusing for group ids. I have made a change for the group id value filling and it now behaves like:
   ```
   SELECT 
   intCol, 
   argmin(longCol, **doubleCol**),  
   argmin(longCol, **longCol**)   
   FROM table 
   GROUP BY intCol
   ```
   
   intCol | argmin(longCol, doubleCol) | argmin(longCol, longCol)
   -- | -- | --
   1 | 2.0 | 2
   1 | 3.0 | 2
   2 | 4.0 | 1
   
   SELECT 
   argmin(intCol, **stringCol**),  
   argmin(intCol, **doubleCol**), 
   sum(doubleCol)  
   FROM table
   
   argmin(intCol, stringCol) | argmin(intCol, doubleCol) | sum(doubleCol)
   -- | -- | --
   "a2" | 2.0 | 9.0
   "a11"* | 3.0 | 9.0
   
   which is essentially flattened view of 
   
   -- | -- | --
   1 | [2.0, 3.0] | 2
   2 | 4.0 | 1
   
   and 
   
   -- | -- | --
   ["a2", "a11"] | [2.0, 3.0] | 9.0
   
   respectively
   
   Meanwhile, I have also considered the option array fashion of returning multiple rows of output, there are a few reasons I didn't use it:
   
   1. It wouldn't work for all MV types as we currently don't have sth like ARRAY[ARRAY[INT]] for returned results
   2. It would be easier for the user to parse the result when this is flattened, as the user side will not need to flatten + align them on their own when they are projecting multiple cols.
   3. Using the flattened view will keep the output column type the same as the data column type, which I feel is cleaner.
   
   cc @siddharthteotia @somandal 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org