You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/01/28 07:16:00 UTC

[jira] [Work logged] (HIVE-25840) Prevent duplicate paths in the fileList while adding an entry to NotifcationLog

     [ https://issues.apache.org/jira/browse/HIVE-25840?focusedWorklogId=716905&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-716905 ]

ASF GitHub Bot logged work on HIVE-25840:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Jan/22 07:15
            Start Date: 28/Jan/22 07:15
    Worklog Time Spent: 10m 
      Work Description: ayushtkn commented on a change in pull request #2913:
URL: https://github.com/apache/hive/pull/2913#discussion_r793585254



##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/events/InsertEvent.java
##########
@@ -85,7 +87,7 @@ public InsertEvent(String catName, String db, String table, List<String> partVal
 
     // If replace flag is not set by caller, then by default set it to true to maintain backward compatibility
     this.replace = (insertData.isSetReplace() ? insertData.isReplace() : true);
-    this.files = insertData.getFilesAdded();
+    this.files = new LinkedHashSet<>(insertData.getFilesAdded());

Review comment:
       Is there a chance that ``insertData.getFilesAdded()`` can be null? Give a check once, if not sure better to wrap with a ``null`` check.
   The method signature though has an annotation. of ``Nullable``
   ``
     @org.apache.thrift.annotation.Nullable
     public java.util.List<java.lang.String> getFilesAdded() {
   ``

##########
File path: hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/listener/DbNotificationListener.java
##########
@@ -607,9 +606,13 @@ public void remove() {
   @Override
   public void onInsert(InsertEvent insertEvent) throws MetaException {
     Table tableObj = insertEvent.getTableObj();
+    List<String> eventFiles = new ArrayList<>(insertEvent.getFiles());
+    List<String> eventCheckSums = insertEvent.getFileChecksums();
+    if(eventFiles != null && eventCheckSums != null && eventCheckSums.size() != eventFiles.size()) {

Review comment:
       eventFiles can not be null ever, the check isn't required. you are initiating it at L609

##########
File path: hcatalog/server-extensions/src/main/java/org/apache/hive/hcatalog/listener/DbNotificationListener.java
##########
@@ -607,9 +606,13 @@ public void remove() {
   @Override
   public void onInsert(InsertEvent insertEvent) throws MetaException {
     Table tableObj = insertEvent.getTableObj();
+    List<String> eventFiles = new ArrayList<>(insertEvent.getFiles());
+    List<String> eventCheckSums = insertEvent.getFileChecksums();
+    if(eventFiles != null && eventCheckSums != null && eventCheckSums.size() != eventFiles.size()) {
+      throw new IllegalStateException("Number of files and checksums do not match for insert event");

Review comment:
       eventFiles is a Set, eventChecksums is a list.
   If there are two same files getting added, so two checksums will also get added, but eventCheckSums is a List, won't that trigger this exception always? The size gonna differ always in that case?

##########
File path: standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/events/BatchAcidWriteEvent.java
##########
@@ -62,8 +64,8 @@ public Long getTxnId(int idx) {
     return writeNotificationLogRequestList.get(idx).getTxnId();
   }
 
-  public List<String> getFiles(int idx) {
-    return writeNotificationLogRequestList.get(idx).getFileInfo().getFilesAdded();
+  public Set<String> getFiles(int idx) {
+    return new LinkedHashSet<>(writeNotificationLogRequestList.get(idx).getFileInfo().getFilesAdded());

Review comment:
       Give a check to ``getFilesAdded()`` being ``null``, if there is a possibility wrap it up with a ``NULL`` check and keep the set empty in that case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 716905)
    Remaining Estimate: 0h
            Time Spent: 10m

> Prevent duplicate paths in the fileList while adding an entry to NotifcationLog
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-25840
>                 URL: https://issues.apache.org/jira/browse/HIVE-25840
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Arko Sharma
>            Assignee: Arko Sharma
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> As of now, while adding entries to notification logs, in case of retries, sometimes the same path gets added to the notification log entry, which during replication leads to failures during copy.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)