You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by "puchengy (via GitHub)" <gi...@apache.org> on 2023/04/27 19:07:03 UTC

[GitHub] [iceberg] puchengy commented on pull request #7430: Allow sparksql to override target split size with session property

puchengy commented on PR #7430:
URL: https://github.com/apache/iceberg/pull/7430#issuecomment-1526199803

   Hi @aokolnychyi I think there is legit value for this.
   
   We are migrating hundreds of Hive tables to Iceberg. Ensuring the SparkSQL consumers of these tables don't fail is our top priorities. So the SparkSQL job used to read Hive table with some "spark.sql.files.maxPartitionBytes" values will fail if the Iceberg table split size is at huge difference causing more splits to be generated causing job failures.
   
   It is even more complicated if different downstream jobs have different "spark.sql.files.maxPartitionBytes" values (I am not sure if this really happens, but in theory it could).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org