You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by "zhangbutao (via GitHub)" <gi...@apache.org> on 2023/05/09 08:46:06 UTC

[GitHub] [hive] zhangbutao commented on a diff in pull request #4301: HIVE-27327 : Iceberg basic stats: Incorrect row count in snapshot sum…

zhangbutao commented on code in PR #4301:
URL: https://github.com/apache/hive/pull/4301#discussion_r1188325890


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -346,7 +346,16 @@ public Map<String, String> getBasicStatistics(Partish partish) {
               stats.put(StatsSetupConst.NUM_FILES, summary.get(SnapshotSummary.TOTAL_DATA_FILES_PROP));
             }
             if (summary.containsKey(SnapshotSummary.TOTAL_RECORDS_PROP)) {
-              stats.put(StatsSetupConst.ROW_COUNT, summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+              long totalRecords = Long.parseLong(summary.get(SnapshotSummary.TOTAL_RECORDS_PROP));
+              if (summary.containsKey(SnapshotSummary.TOTAL_EQ_DELETES_PROP) &&
+                  summary.containsKey(SnapshotSummary.TOTAL_POS_DELETES_PROP)) {
+                Long actualRecords =

Review Comment:
   Just share some my thought.
   Not sure if i am understand correctly, the delete file in iceberg is also a special data file, and table scan in actual execution stage also should read all related delete files.
   
   That is to say, the actual execution still requires scanning more data than the explain shows.
   So, i am not sure if this PR can be give a optimized plans when iceberg table has both data files and delete files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org