You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/01/28 07:15:47 UTC

[GitHub] [hive] ayushtkn commented on a change in pull request #2913: HIVE-25840: Prevent duplicate paths in the fileList while adding an e…

ayushtkn commented on a change in pull request #2913:
URL: https://github.com/apache/hive/pull/2913#discussion_r793585254



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/events/InsertEvent.java
##########
@@ -85,7 +87,7 @@ public InsertEvent(String catName, String db, String table, List<String> partVal
 
     // If replace flag is not set by caller, then by default set it to true to maintain backward compatibility
     this.replace = (insertData.isSetReplace() ? insertData.isReplace() : true);
-    this.files = insertData.getFilesAdded();
+    this.files = new LinkedHashSet<>(insertData.getFilesAdded());

Review comment:
       Is there a chance that ``insertData.getFilesAdded()`` can be null? Give a check once, if not sure better to wrap with a ``null`` check.
   The method signature though has an annotation. of ``Nullable``
   ``
     @org.apache.thrift.annotation.Nullable
     public java.util.List<java.lang.String> getFilesAdded() {
   ``

##########
File path: hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/listener/DbNotificationListener.java
##########
@@ -607,9 +606,13 @@ public void remove() {
   @Override
   public void onInsert(InsertEvent insertEvent) throws MetaException {
     Table tableObj = insertEvent.getTableObj();
+    List<String> eventFiles = new ArrayList<>(insertEvent.getFiles());
+    List<String> eventCheckSums = insertEvent.getFileChecksums();
+    if(eventFiles != null && eventCheckSums != null && eventCheckSums.size() != eventFiles.size()) {

Review comment:
       eventFiles can not be null ever, the check isn't required. you are initiating it at L609

##########
File path: hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/listener/DbNotificationListener.java
##########
@@ -607,9 +606,13 @@ public void remove() {
   @Override
   public void onInsert(InsertEvent insertEvent) throws MetaException {
     Table tableObj = insertEvent.getTableObj();
+    List<String> eventFiles = new ArrayList<>(insertEvent.getFiles());
+    List<String> eventCheckSums = insertEvent.getFileChecksums();
+    if(eventFiles != null && eventCheckSums != null && eventCheckSums.size() != eventFiles.size()) {
+      throw new IllegalStateException("Number of files and checksums do not match for insert event");

Review comment:
       eventFiles is a Set, eventChecksums is a list.
   If there are two same files getting added, so two checksums will also get added, but eventCheckSums is a List, won't that trigger this exception always? The size gonna differ always in that case?

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/events/BatchAcidWriteEvent.java
##########
@@ -62,8 +64,8 @@ public Long getTxnId(int idx) {
     return writeNotificationLogRequestList.get(idx).getTxnId();
   }
 
-  public List<String> getFiles(int idx) {
-    return writeNotificationLogRequestList.get(idx).getFileInfo().getFilesAdded();
+  public Set<String> getFiles(int idx) {
+    return new LinkedHashSet<>(writeNotificationLogRequestList.get(idx).getFileInfo().getFilesAdded());

Review comment:
       Give a check to ``getFilesAdded()`` being ``null``, if there is a possibility wrap it up with a ``NULL`` check and keep the set empty in that case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org