You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "RussellSpitzer (via GitHub)" <gi...@apache.org> on 2023/02/10 15:07:28 UTC

[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #6801: API,Core,Spark: Add rewritten bytes to rewrite data files procedure results

RussellSpitzer commented on code in PR #6801:
URL: https://github.com/apache/iceberg/pull/6801#discussion_r1102875656


##########
spark/v3.3/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestRewriteDataFilesProcedure.java:
##########
@@ -69,7 +69,7 @@ public void testZOrderSortExpression() {
   public void testRewriteDataFilesInEmptyTable() {
     createTable();
     List<Object[]> output = sql("CALL %s.system.rewrite_data_files('%s')", catalogName, tableIdent);
-    assertEquals("Procedure output must match", ImmutableList.of(row(0, 0)), output);
+    assertEquals("Procedure output must match", ImmutableList.of(row(0, 0, 0L)), output);

Review Comment:
   These changes are the only ones that i'm worried about. This will be *very* brittle. We cannot guarantee that other random changes won't change the file sizes, like every parquet version change, or compression library upgrade or who knows what. Let's make this either a soft check OR programmatically figure out the size.
   
   So either
   (FileSize is around X)
   or
   
   ((SELECT sum(file_size_in_bytes) from table.files)) should be RewriteResult.sizeInBytes)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org