You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/01/04 10:15:15 UTC

[GitHub] [hudi] xiarixiaoyao edited a comment on pull request #4405: HUDI-3068 Fixing sync all partitions

xiarixiaoyao edited a comment on pull request #4405:
URL: https://github.com/apache/hudi/pull/4405#issuecomment-1004680987


   @nsivabalan 
   According to the current logic, this problem is difficult to occur, because we determine whether the current partition needs alter by comparing whether the paths of the partitions are the same.  It is not common for Hudi tables to modify partition paths,Although  we can modify the partition path through alter partition syntax.
   
   It's easy to reproduce this problem in UT code,
   add follow codes after line 146 in TestHiveSyncTool
       _String testP = Arrays.stream(hiveClient.scanTablePartitions(hiveSyncConfig.tableName).get(0).getValues().get(0).split("-")).collect(Collectors.joining("/"));
       hiveClient.updatePartitionsToTable(hiveSyncConfig.tableName, Arrays.asList(testP));_
   
   
   BTW
   When we sync alter partitions,we should better set "numFiles" and "totalSize" for our alterd partitions.
   
   since hive.stats.autogather=true by default, hive will try to calculate partitionStats( "numFiles" and. "totalSize") by default,
   1)for add partition operation:when sync new partitions to hive,hive will call updatePartitionStatsFast to update the Stats for every new partition。
   2)for alter partition operation:hive metastore will find the old partition which need to alter firstly;
   then hive metastore will try to update the partition stats by comparing the stats between old partition and our altered partition
   however the oldPartition has stats but our altered partition has no stats(we has not specified it), so the error occur.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org