You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "matthewwillian (via GitHub)" <gi...@apache.org> on 2023/04/28 16:40:05 UTC
[GitHub] [iceberg] matthewwillian opened a new issue, #7461: Predicate Pushdown Not Working as Expected
matthewwillian opened a new issue, #7461:
URL: https://github.com/apache/iceberg/issues/7461
### Query engine
Glue 3.0
### Question
I have the following query
```
results_df = spark.sql(f'''
MERGE INTO {args["catalog"]}.{args["database"]}.{args["output_table"]} t
USING (SELECT * FROM {NEW_EVENTS_DATA_VIEW}) s
ON (s.client_id = t.client_id AND
s.environment_type = t.environment_type AND
s.customer_id_hash = t.customer_id_hash AND
s.timestamp = t.timestamp AND
s.customer_id = t.customer_id AND
s.id = t.id)
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *''')
```
that is merging into a table with the following schema
```
# Table schema:
# col_name data_type comment
id string
client_id string
environment_type string
customer_id string
customer_id_hash string
timestamp timestamp
transaction_id string
event_type string
received_at timestamp
properties map<string, string>
topic string
partition int
offset bigint
jws_data string
# Partition spec:
# field_name field_transform column_name
client_id identity client_id
environment_type identity environment_type
customer_id_hash identity customer_id_hash
timestamp_day day timestamp
```
When I run this query, the spark query plan appears to show that predicate pushdown is not happening on the `timestamp_day` partitions. Is that intended behavior?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
Re: [I] Predicate Pushdown Not Working as Expected [iceberg]
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #7461: Predicate Pushdown Not Working as Expected
URL: https://github.com/apache/iceberg/issues/7461
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
Re: [I] Predicate Pushdown Not Working as Expected [iceberg]
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #7461:
URL: https://github.com/apache/iceberg/issues/7461#issuecomment-1817684755
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] RussellSpitzer commented on issue #7461: Predicate Pushdown Not Working as Expected
Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on issue #7461:
URL: https://github.com/apache/iceberg/issues/7461#issuecomment-1527837452
What's the plan, and why would there be a pushdown on timestamp_day? The pushdown would be on timestamp which Iceberg would then convert to an expression on timestamp_day
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] RussellSpitzer commented on issue #7461: Predicate Pushdown Not Working as Expected
Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on issue #7461:
URL: https://github.com/apache/iceberg/issues/7461#issuecomment-1527882660
Oh so you are looking for dynamic runtime partition pruning?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
Re: [I] Predicate Pushdown Not Working as Expected [iceberg]
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #7461:
URL: https://github.com/apache/iceberg/issues/7461#issuecomment-1780223461
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] matthewwillian commented on issue #7461: Predicate Pushdown Not Working as Expected
Posted by "matthewwillian (via GitHub)" <gi...@apache.org>.
matthewwillian commented on issue #7461:
URL: https://github.com/apache/iceberg/issues/7461#issuecomment-1527886235
If by that you mean what's described in this article then yes https://medium.com/@prabhakaran.electric/spark-3-0-feature-dynamic-partition-pruning-dpp-to-avoid-scanning-irrelevant-data-1a7bbd006a89. Will it work if I just enable that config in my Spark job? If so, are there tradeoffs to consider when enabling that config?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] matthewwillian commented on issue #7461: Predicate Pushdown Not Working as Expected
Posted by "matthewwillian (via GitHub)" <gi...@apache.org>.
matthewwillian commented on issue #7461:
URL: https://github.com/apache/iceberg/issues/7461#issuecomment-1527878730
Attaching the query plan
[query_plan.txt](https://github.com/apache/iceberg/files/11355981/query_plan.txt)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org