You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "amogh-jahagirdar (via GitHub)" <gi...@apache.org> on 2023/03/11 22:08:56 UTC

[GitHub] [iceberg] amogh-jahagirdar commented on issue #7071: Dynamic Split generation based on table size

amogh-jahagirdar commented on issue #7071:
URL: https://github.com/apache/iceberg/issues/7071#issuecomment-1465034853

   Thanks for bringing back this discussion @singhpk234 , I think it makes sense to have the table format itself be able to help determine optimal split sizes because the table format has the statistics to determine good values for a given table.
   
   I think we just need to define what the user experience + engine integration experience should look like considering `read.split.target-size` is already defined with a default of 128mb. One simple approach that comes to mind is define another table property `read.split.auto-size` which defaults to false. If it's set to true, then `read.split.target-size` should be ignored by the engine, and the engine takes the responsibility for determining optimal split size given the table stats. In the iceberg library there could be a utility for a recommended size based on table stats, and the engine just delegates to the library if they want to use that or they can override if there's something better for the engine. Over time, as we get confidence in the auto-size behavior for different engines we can make it default to true. Just my initial thoughts, open to others ideas here as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org