You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2023/01/14 07:50:29 UTC

[GitHub] [pinot] ankitsultana opened a new issue, #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets

ankitsultana opened a new issue, #10129:
URL: https://github.com/apache/pinot/issues/10129

   Repro:
   
   ```
   SELECT
     COUNT(*)
   FROM
     baseballStats_OFFLINE AS A
     JOIN baseballStats_OFFLINE AS B ON A.playerID = B.playerID
   WHERE
     A.hits > 10 AND B.hits < 5
   ```
   
   This query will end up reading all the columns in the table-scan stage. The reason is that there's no projection node created by Calcite. (will add more repro queries soon)
   
   I am testing out a few approaches for a fix in this PR: https://github.com/apache/pinot/pull/10122
   
   Btw, this is what GPT recommends:
   
   cc: @walterddr 
   
   <img width="786" alt="image" src="https://user-images.githubusercontent.com/8644710/212462065-5cec3444-31d4-4965-b4f3-f771303a1fa5.png">
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ankitsultana commented on issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets

Posted by GitBox <gi...@apache.org>.
ankitsultana commented on issue #10129:
URL: https://github.com/apache/pinot/issues/10129#issuecomment-1398417863

   @walterddr : Can we consider using `RelFieldTrimmer`?
   
   I raised this: https://github.com/apache/pinot/pull/10156
   
   It seems to work and I am testing it out on our clusters. The PR is not final as I am also reading more about this feature. There are some caveats called out in the code also regarding this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #10129:
URL: https://github.com/apache/pinot/issues/10129#issuecomment-1384333850

   sounds like a good idea. :-P


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #10129:
URL: https://github.com/apache/pinot/issues/10129#issuecomment-1385871610

   we can leverage calcite's AggregateExtractProjectRule and split the aggregate with aggregate + project. 
   
   this only needed for count(*), i realized that sum and others are properly putting the project in place


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets

Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on issue #10129:
URL: https://github.com/apache/pinot/issues/10129#issuecomment-1410613975

   is this issue fixed by #10187 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets

Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #10129:
URL: https://github.com/apache/pinot/issues/10129#issuecomment-1397438501

   did a bit of research on this topic:
   
   the problem only occurs on `COUNT(*)` or `COUNT(1)`, `COUNT(col)` this is b/c calcite will optimize out the argument since all columns in pinot are marked non-nullable. 
   
   1. several attempts have been tried on this one but the most promising so far is to replace `COUNT` with `SUM(1)` thus bypass calcite's optimization. with this project pushdown are being executed properly.
   2. @ankitsultana also reported that the pushdown doesn't apply to semi JOINs with NOT-IN clause properly
    ```
   SELECT
     SUM(1)
   FROM
     baseballStats_OFFLINE
   WHERE
     hits > 10 AND 
     playerID NOT IN (SELECT playerID FROM baseballStats_OFFLINE WHERE hits < 5)
   ```
   ^ this seems to be related to the fact that NOT IN produces a in-equality JOIN condition that cannot be optimized into a proper JOIN and thus cannot apply proper pushdown rules. 
   which we will address separately. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ankitsultana commented on issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana commented on issue #10129:
URL: https://github.com/apache/pinot/issues/10129#issuecomment-1412270485

   Yes I think we can close this now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] ankitsultana closed issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets

Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana closed issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets
URL: https://github.com/apache/pinot/issues/10129


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org