You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "matthewwillian (via GitHub)" <gi...@apache.org> on 2023/04/28 16:40:05 UTC

[GitHub] [iceberg] matthewwillian opened a new issue, #7461: Predicate Pushdown Not Working as Expected

matthewwillian opened a new issue, #7461:
URL: https://github.com/apache/iceberg/issues/7461

   ### Query engine
   
   Glue 3.0
   
   ### Question
   
   I have the following query
   ```
           results_df = spark.sql(f'''
   MERGE INTO {args["catalog"]}.{args["database"]}.{args["output_table"]} t
   USING (SELECT * FROM {NEW_EVENTS_DATA_VIEW}) s
   ON (s.client_id = t.client_id AND
       s.environment_type = t.environment_type AND
       s.customer_id_hash = t.customer_id_hash AND
       s.timestamp = t.timestamp AND
       s.customer_id = t.customer_id AND
       s.id = t.id)
   WHEN MATCHED THEN UPDATE SET *
   WHEN NOT MATCHED THEN INSERT *''')
   ```
   that is merging into a table with the following schema
   ```
   # Table schema:		
   # col_name	data_type	comment
   id	string	
   client_id	string	
   environment_type	string	
   customer_id	string	
   customer_id_hash	string	
   timestamp	timestamp	
   transaction_id	string	
   event_type	string	
   received_at	timestamp	
   properties	map<string, string>	
   topic	string	
   partition	int	
   offset	bigint	
   jws_data	string	
   		
   # Partition spec:		
   # field_name	field_transform	column_name
   client_id	identity	client_id
   environment_type	identity	environment_type
   customer_id_hash	identity	customer_id_hash
   timestamp_day	day	timestamp
   ```
   When I run this query, the spark query plan appears to show that predicate pushdown is not happening on the `timestamp_day` partitions. Is that intended behavior?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] Predicate Pushdown Not Working as Expected [iceberg]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #7461: Predicate Pushdown Not Working as Expected
URL: https://github.com/apache/iceberg/issues/7461


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] Predicate Pushdown Not Working as Expected [iceberg]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #7461:
URL: https://github.com/apache/iceberg/issues/7461#issuecomment-1817684755

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #7461: Predicate Pushdown Not Working as Expected

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on issue #7461:
URL: https://github.com/apache/iceberg/issues/7461#issuecomment-1527837452

   What's the plan, and why would there be a pushdown on timestamp_day? The pushdown would be on timestamp which Iceberg would then convert to an expression on timestamp_day


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #7461: Predicate Pushdown Not Working as Expected

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on issue #7461:
URL: https://github.com/apache/iceberg/issues/7461#issuecomment-1527882660

   Oh so you are looking for dynamic runtime partition pruning?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] Predicate Pushdown Not Working as Expected [iceberg]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #7461:
URL: https://github.com/apache/iceberg/issues/7461#issuecomment-1780223461

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] matthewwillian commented on issue #7461: Predicate Pushdown Not Working as Expected

Posted by "matthewwillian (via GitHub)" <gi...@apache.org>.
matthewwillian commented on issue #7461:
URL: https://github.com/apache/iceberg/issues/7461#issuecomment-1527886235

   If by that you mean what's described in this article then yes https://medium.com/@prabhakaran.electric/spark-3-0-feature-dynamic-partition-pruning-dpp-to-avoid-scanning-irrelevant-data-1a7bbd006a89. Will it work if I just enable that config in my Spark job? If so, are there tradeoffs to consider when enabling that config?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] matthewwillian commented on issue #7461: Predicate Pushdown Not Working as Expected

Posted by "matthewwillian (via GitHub)" <gi...@apache.org>.
matthewwillian commented on issue #7461:
URL: https://github.com/apache/iceberg/issues/7461#issuecomment-1527878730

   Attaching the query plan
   [query_plan.txt](https://github.com/apache/iceberg/files/11355981/query_plan.txt)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org