You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/11 15:59:53 UTC

[GitHub] [iceberg] szehon-ho opened a new issue #2326: Partition Table Performance

szehon-ho opened a new issue #2326:
URL: https://github.com/apache/iceberg/issues/2326


   I was struggling the other day with listing partitions of a table with 4000 manifest files, it takes ~10 minutes on S3, which is a bit of a shock for people used to the speed of querying Hive metastore.  While it is possible to optimize the metadata, was chatting a bit with @RussellSpitzer about this issue, and seems the way to go is making the reading of PartitionTable a proper Spark job with predicate push down (https://github.com/apache/iceberg/pull/1421 and https://github.com/apache/iceberg/issues/1552)
   
   Going back to the common use-case of listing partitions, one quick win may be getting the min/max partition, which I saw a lot in Hive, by schedulers like airflow sensor to detect new data, or for retention tools to detect and kick-in the old data. 
   
   Unfortunately it seems that aggregate push-down is not supported in Spark DSV2, unless I am mistaken, which would be the ideal, then it can pushed down to the PartitionTable.  But the information seems also available on manifest-list (partition boundaries), which would be really fast, but does it make sense to expose this just as an Iceberg API for a quick win?
   
   Another workaround is just to push down predicate filter in the existing non-distributed PartitionTable TableScan, which is not done today.  I think it would be a good change anyway.  The query logic is a bit more complex to find the max/min partition (keep expanding the predicate from the expected latest until you hit something), but it's useful to answer other queries like for example how many partitions for a given day.  Is there any reason not to do this? 
   
   @rdblue  @aokolnychyi  do you guys have any thoughts about these ideas, or any other thoughts you had in the past about it?  Thanks 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-798797152


   Nice, I think with aggregate pushdown it could potentially open many faster queries in Iceberg overall, given how much metadata we have, if we are ok to answer queries with them.
   
   I think #2182 is interesting, but it might be true that it's not worth the cost if it adds a ton more time for each commit.
   
   @aokolnychyi  I gave a try today for doing equivalent query on files table, it's much faster (~minute vs ~10s of minutes).  I even added distinct in the end and it does not add much time.  It's a shame that users first try partitions table and not files table then.  I guess there's not much we can do unless we have this support?
   
   By the way as Russell was pointing to me, I was looking at making an improvement by adding predicate pushdown using ManifestEvaluator to filter out manifest-files, as the Manifest List has each manifest-file's partition min/max.  If I understand correctly, it requires converting a filter on the "partition" table (partition.part_field = x) to a ManifestGroup "partition filter" (part_field = x).  Now I think if this functionality will be compatible with later rewriting partitions table to use view of files table (I guess, at that point, we make the equivalent pushdown on file-table using 'partition.x' field which is not done today there either).
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi edited a comment on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
aokolnychyi edited a comment on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-797711579


   Well, it was expected the partitions table would not perform that well when a lot of metadata is present. I had [this](https://github.com/apache/iceberg/pull/655#discussion_r347950984) comment on the original PR.
   
   Doing an aggregate on top of the files metadata table in Spark would be way faster but not instant. Have you tried that as a temp solution, @szehon-ho? I just wonder what's the difference in terms of performance. You can tune the input split size for metadata tables to have reasonable parallelism.
   
   There is also an interesting proposal in #2182. Can we potentially benefit from it here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-797768675


   @Parth-Brahmbhatt, yeah, that would be great to do that when Spark has the view catalog.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-796843692


   I think the predicate pushdown approach is probably the simplest solution at the moment, if you look in the Partition table code it creates a Scan which can have predicates applied to it, these should be filtered in the same way a normal scan would be.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-797814544


   @sunchao, it is great to see progress on that one! 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] sunchao commented on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
sunchao commented on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-797725905


   FWIW aggregation pushdown is being worked on in Spark at the moment ([PR](https://github.com/apache/spark/pull/29695)). It targets JDBC datasources right now but @huaxingao is extending that to Parquet/Iceberg.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho edited a comment on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
szehon-ho edited a comment on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-796845953


   Yea I was thinking that, I can take a quick look at this.  Wondering about exposing Partition min/max in API based on what is 
   on the current snapshot's manifest-list, I guess it's a bit too internal to V1 spec?  Even though it'd be a bit useful for a super quick response.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho closed issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
szehon-ho closed issue #2326:
URL: https://github.com/apache/iceberg/issues/2326


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Parth-Brahmbhatt commented on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
Parth-Brahmbhatt commented on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-797719341


   In presto , we basically implemented the partitions implementation as a view on top of files table and even for large table the performance is acceptable. We could do the same for spark and implement partitions as a view on top of files so the users don't really have to craft a query manually


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho edited a comment on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
szehon-ho edited a comment on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-798797152


   Nice, I think with aggregate pushdown it could potentially open many faster queries in Iceberg overall, given how much metadata we have, if we are ok to answer queries with them.
   
   I think #2182 is interesting, but it might be true that it's not worth the cost if it adds a ton more time for each commit.
   
   @aokolnychyi  I gave a try today for doing the equivalent query on the files table, and it's much faster (~minute vs ~10s of minutes).  I even added distinct in the end and it does not add much time.  It's a shame that users first try partitions table and not files table then for listing partitions.  I guess there's not much we can do unless we have the spark view catalog support as mentioned in that PR?
   
   By the way, as Russell was pointing to me, I was looking at making an improvement by adding predicate pushdown using ManifestEvaluator to filter out manifest-files, as the Manifest List already has each manifest-file's partition min/max.  If I understand correctly, it requires converting a filter on the "partition" table (partition.part_field = x) to a "partition filter" (part_field = x).  Now it makes me wonder if this functionality will be compatible with later rewriting partitions table to use view of files table.  I guess it would be, as we should have an equivalent partition filter on files_table as well (partition.part_field = x, which is not there today).
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-797814402


   @szehon-ho, could you please try the equivalent query using the files table?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-797712252


   I definitely agree knowing a list of partitions in a table is a very common scenario and it should be quick.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-797711579


   Well, it was expected the partitions table would not perform that well when a lot of metadata is present. I had [this](https://github.com/apache/iceberg/pull/655#discussion_r347950984) comment on the original PR.
   
   Doing an aggregate on top of the files metadata table in Spark would be way faster but not instant. Have you tried that as a temp solution, @szehon-ho? I just wonder the difference in terms of performance.
   
   There is also an interesting proposal in #2182. Can we potentially benefit from it here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-796868732


   Yea sorry if I missed your point, I think pushing down expression to PartitionTable can either be Spark or current route (for Spark can wait until you PR is done).  
   
   But min/max is not an expression (rather an aggregate)?  Which I guess is not pushed down by Spark for now?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi edited a comment on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
aokolnychyi edited a comment on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-797711579


   Well, it was expected the partitions table would not perform that well when a lot of metadata is present. I had [this](https://github.com/apache/iceberg/pull/655#discussion_r347950984) comment on the original PR.
   
   Doing an aggregate on top of the files metadata table in Spark would be way faster but not instant. Have you tried that as a temp solution, @szehon-ho? I just wonder what's the difference in terms of performance.
   
   There is also an interesting proposal in #2182. Can we potentially benefit from it here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-796845953


   Yea I was thinking that, I can take a quick look at this.  Wondering about adding Partition boundaries to API based min/max  based on the current snapshot's manifest-list, I guess it's a bit too internal to V1 spec?  Even though it'd be a bit useful for a super quick response.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-796847658


   I think if you wanted to go the spark route you could do the same conversion we already do for Spark Filters -> Iceberg Expressions and just apply them directly? I would think we could just expose "PartitionTable(Expressions)" 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on issue #2326: Partition Table Performance

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on issue #2326:
URL: https://github.com/apache/iceberg/issues/2326#issuecomment-1069484735


   Closing as one fix was merged, can make a new one if more options open up


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org