You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2023/01/14 07:50:29 UTC
[GitHub] [pinot] ankitsultana opened a new issue, #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets
ankitsultana opened a new issue, #10129:
URL: https://github.com/apache/pinot/issues/10129
Repro:
```
SELECT
COUNT(*)
FROM
baseballStats_OFFLINE AS A
JOIN baseballStats_OFFLINE AS B ON A.playerID = B.playerID
WHERE
A.hits > 10 AND B.hits < 5
```
This query will end up reading all the columns in the table-scan stage. The reason is that there's no projection node created by Calcite. (will add more repro queries soon)
I am testing out a few approaches for a fix in this PR: https://github.com/apache/pinot/pull/10122
Btw, this is what GPT recommends:
cc: @walterddr
<img width="786" alt="image" src="https://user-images.githubusercontent.com/8644710/212462065-5cec3444-31d4-4965-b4f3-f771303a1fa5.png">
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] ankitsultana commented on issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets
Posted by GitBox <gi...@apache.org>.
ankitsultana commented on issue #10129:
URL: https://github.com/apache/pinot/issues/10129#issuecomment-1398417863
@walterddr : Can we consider using `RelFieldTrimmer`?
I raised this: https://github.com/apache/pinot/pull/10156
It seems to work and I am testing it out on our clusters. The PR is not final as I am also reading more about this feature. There are some caveats called out in the code also regarding this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] walterddr commented on issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets
Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #10129:
URL: https://github.com/apache/pinot/issues/10129#issuecomment-1384333850
sounds like a good idea. :-P
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] walterddr commented on issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets
Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #10129:
URL: https://github.com/apache/pinot/issues/10129#issuecomment-1385871610
we can leverage calcite's AggregateExtractProjectRule and split the aggregate with aggregate + project.
this only needed for count(*), i realized that sum and others are properly putting the project in place
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] walterddr commented on issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets
Posted by "walterddr (via GitHub)" <gi...@apache.org>.
walterddr commented on issue #10129:
URL: https://github.com/apache/pinot/issues/10129#issuecomment-1410613975
is this issue fixed by #10187 ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] walterddr commented on issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets
Posted by GitBox <gi...@apache.org>.
walterddr commented on issue #10129:
URL: https://github.com/apache/pinot/issues/10129#issuecomment-1397438501
did a bit of research on this topic:
the problem only occurs on `COUNT(*)` or `COUNT(1)`, `COUNT(col)` this is b/c calcite will optimize out the argument since all columns in pinot are marked non-nullable.
1. several attempts have been tried on this one but the most promising so far is to replace `COUNT` with `SUM(1)` thus bypass calcite's optimization. with this project pushdown are being executed properly.
2. @ankitsultana also reported that the pushdown doesn't apply to semi JOINs with NOT-IN clause properly
```
SELECT
SUM(1)
FROM
baseballStats_OFFLINE
WHERE
hits > 10 AND
playerID NOT IN (SELECT playerID FROM baseballStats_OFFLINE WHERE hits < 5)
```
^ this seems to be related to the fact that NOT IN produces a in-equality JOIN condition that cannot be optimized into a proper JOIN and thus cannot apply proper pushdown rules.
which we will address separately.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] ankitsultana commented on issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets
Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana commented on issue #10129:
URL: https://github.com/apache/pinot/issues/10129#issuecomment-1412270485
Yes I think we can close this now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] ankitsultana closed issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets
Posted by "ankitsultana (via GitHub)" <gi...@apache.org>.
ankitsultana closed issue #10129: [multistage] Projection Not Pushed Down for Agg Queries with No Grouping Sets
URL: https://github.com/apache/pinot/issues/10129
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org