You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "dmgcodevil (via GitHub)" <gi...@apache.org> on 2023/05/03 02:41:07 UTC

[GitHub] [iceberg] dmgcodevil opened a new issue, #7506: Is is possible to control the number of partitions (groups) for compaction ?

dmgcodevil opened a new issue, #7506:
URL: https://github.com/apache/iceberg/issues/7506

   ### Query engine
   
   Trino
   
   ### Question
   
   I've found a new compaction option `max-file-group-size-bytes` which is very useful b/c it allows compacting very large data sets while using sorting. However, is there an option to control a max number of groups (partitions)? I could not find it. It could be useful in a case where a lot of partitions should be compacted. We'd like to limit the execution time of our compaction spark job.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on issue #7506: Is is possible to control the number of partitions (groups) for compaction ?

Posted by "singhpk234 (via GitHub)" <gi...@apache.org>.
singhpk234 commented on issue #7506:
URL: https://github.com/apache/iceberg/issues/7506#issuecomment-1532496128

   > It could be useful in a case where a lot of partitions should be compacted. We'd like to limit the execution time of our compaction spark job.
   
   You can pass a partition filters via the `where` clause to run compaction on a subset of partitions. please ref (https://iceberg.apache.org/docs/latest/spark-procedures/#rewrite_data_files)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] Is is possible to control the number of partitions (groups) for compaction ? [iceberg]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #7506:
URL: https://github.com/apache/iceberg/issues/7506#issuecomment-1869834307

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] Is is possible to control the number of partitions (groups) for compaction ? [iceberg]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #7506: Is is possible to control the number of partitions (groups) for compaction ?
URL: https://github.com/apache/iceberg/issues/7506


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] Is is possible to control the number of partitions (groups) for compaction ? [iceberg]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #7506:
URL: https://github.com/apache/iceberg/issues/7506#issuecomment-1851097979

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dmgcodevil commented on issue #7506: Is is possible to control the number of partitions (groups) for compaction ?

Posted by "dmgcodevil (via GitHub)" <gi...@apache.org>.
dmgcodevil commented on issue #7506:
URL: https://github.com/apache/iceberg/issues/7506#issuecomment-1533877858

   @singhpk234 is `where` only available when using [Iceberg SQL extensions](https://iceberg.apache.org/docs/latest/spark-configuration#sql-extensions) ?
   
   We are using `BaseRewriteDataFilesSpark3Action`:
   
   Example:
   
   ```scala
       val result = SparkActions.get().rewriteDataFiles(table)
         .filter(Expressions.greaterThanOrEqual(field, startDate * 1000))
         .filter(Expressions.lessThan(field, endDate * 1000))
         .option("target-file-size-bytes", String.valueOf(config.targetSizeMB * 1024 * 1024))
         .option(RewriteDataFiles.MAX_FILE_GROUP_SIZE_BYTES, String.valueOf(1024L * 1024L * 1024L * 10L)) // 10 Gigabytes
         .option(RewriteDataFiles.PARTIAL_PROGRESS_MAX_COMMITS, "20")
         .option(RewriteDataFiles.PARTIAL_PROGRESS_ENABLED, "true")
         .execute()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org