You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "aw381246 (via GitHub)" <gi...@apache.org> on 2024/02/15 15:00:11 UTC

[I] DISTINCTCOUNTMV returns incorrect value when field is included in select/group by [pinot]

aw381246 opened a new issue, #12429:
URL: https://github.com/apache/pinot/issues/12429

   If the field passed into the distinctcountmv function is included in the list of select / group by fields, it will return incorrect counts.
   
   In the case below, the result for each row should be a distinctcount = 1 instead of 2.
   
   If the array has n number of items, the distinctcount will return n instead of 1
   
   ![image](https://github.com/apache/pinot/assets/70593020/8aa67b27-c332-447a-88ed-f28523216a9a)
   ![image](https://github.com/apache/pinot/assets/70593020/2c7101f9-85a5-44d7-88df-f856bc6bb16a)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] DISTINCTCOUNTMV returns incorrect value when field is included in select/group by [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #12429:
URL: https://github.com/apache/pinot/issues/12429#issuecomment-2038564631

   Pinot `MV` semantic works as following:
   - In filter, it is similar to unnest, but each row can only be matched once (e.g. IN (1, 2) will make the row once even if the value contains both 1 and 2)
   - In group-by, it is treating each value as a separate group (same as unnest)
   - In project, the whole MV is projected as an array (different from unnest)
   - `VALUE_IN` is built as a workaround to do extra filter during project


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] DISTINCTCOUNTMV returns incorrect value when field is included in select/group by [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #12429:
URL: https://github.com/apache/pinot/issues/12429#issuecomment-2030452607

   This is similar to #12230, and this behavior is expected because of how pinot execute queries. We have built `VALUE_IN` to work around this problem


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] DISTINCTCOUNTMV returns incorrect value when field is included in select/group by [pinot]

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on issue #12429:
URL: https://github.com/apache/pinot/issues/12429#issuecomment-2030650148

   I don't fully follow. What do you want to achieve with the query?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] DISTINCTCOUNTMV returns incorrect value when field is included in select/group by [pinot]

Posted by "aw381246 (via GitHub)" <gi...@apache.org>.
aw381246 commented on issue #12429:
URL: https://github.com/apache/pinot/issues/12429#issuecomment-2030514242

   @Jackie-Jiang, if I understand `VALUE_IN`, I could specify a specific "host" in the query above, and I'd only get the first row, and the distinct count would = 1 right?  But that doesn't solve the problem of returning _every_ host with the correct distinctcountmv?  In other words, if "1.27.12.151" is in an MV field with either 0 or 100 other hosts, the distinctcountmv should return 1 either way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


Re: [I] DISTINCTCOUNTMV returns incorrect value when field is included in select/group by [pinot]

Posted by "aw381246 (via GitHub)" <gi...@apache.org>.
aw381246 commented on issue #12429:
URL: https://github.com/apache/pinot/issues/12429#issuecomment-2031945191

   If you group by the same column that is in the distinctcount, every row in the result should have a distinctcount of 1.  The only reason pinot is returning a value greater than 1 is because the column is an MV column that has 2 values in this case.  Pinot unnests the MV column when you group by it, but the distinctcount seems to be happening before the unnest instead of after the unnest.
   
   Here's a simple example I created in SQL Server: 
   ![image](https://github.com/apache/pinot/assets/70593020/bb79c127-51b1-4dc9-bc14-c5c054ea2f24)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org