You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2017/04/07 19:56:41 UTC

[jira] [Updated] (TEZ-3685) ShuffleHandler completedInputSet off-by-one error

     [ https://issues.apache.org/jira/browse/TEZ-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Eagles updated TEZ-3685:
---------------------------------
    Description: 
Per comment in TEZ-3334 https://issues.apache.org/jira/browse/TEZ-3334?focusedCommentId=15950550&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15950550

Addressing The off by 1 issue / other correctness issues in this jira
- ShuffleManager: has moved to using a BitSet instead of a ConcurrentSet. May need to look at the synchronization points required for the bitset. (access to the map had some synchronized and some non synchronized access)
- Why was this removed?
{code}
-    // Sanity check
-    // we are guaranteed that key is not null
-    if (srcAttemptsRemaining.get(srcAttemptId.toString()) == null) {
-      // wrongMapErrs.increment(1);
-      LOG.warn("Invalid input. Received output for headerPathComponent: "
-          + pathComponent + "nextRemainingSrcAttemptId: "
-          + getNextRemainingAttempt() + ", mappedSrcAttemptId: " + srcAttemptId);
-      return false;
-    }
{code}
- 
Is there an off by 1 error here? Specifically. input.getInputIdentifier() + ((CompositeInputAttemptIdentifier) input).getInputIdentifierCount(). Think getInputIdentifierCount is always at least 1.
{code}
+      if(input instanceof CompositeInputAttemptIdentifier) {
+        // LLL Is there an off by 0 case here. i.e. getInputIdentiferCount will be set to 1 for a regular event.
+        // We start scanning the bitst early. Also depends on how completedInputSet is setup.
+        if (completedInputSet.nextClearBit(input.getInputIdentifier()) >=
+            input.getInputIdentifier() + ((CompositeInputAttemptIdentifier) input).getInputIdentifierCount()) {
+          inputIter.remove();
+          continue;
+        }
+      } else {
+        if (completedInputSet.get(input.getInputIdentifier())) {
+          inputIter.remove();
+          continue;
+        }
{code}

  was:
Per comment in TEZ-3334 https://issues.apache.org/jira/browse/TEZ-3334?focusedCommentId=15950550&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15950550

Addressing The off by 1 issue / other correctness issues in this jira
- ShuffleManager: has moved to using a BitSet instead of a ConcurrentSet. May need to look at the synchronization points required for the bitset. (access to the map had some synchronized and some non synchronized access)
- In optimizeLocalFetch - report success only after an entire source is done (i.e. all partitions for it). Otherwise the same input may be reported multiple times, which could be problematic.
- Why was this removed?
{code}
-    // Sanity check
-    // we are guaranteed that key is not null
-    if (srcAttemptsRemaining.get(srcAttemptId.toString()) == null) {
-      // wrongMapErrs.increment(1);
-      LOG.warn("Invalid input. Received output for headerPathComponent: "
-          + pathComponent + "nextRemainingSrcAttemptId: "
-          + getNextRemainingAttempt() + ", mappedSrcAttemptId: " + srcAttemptId);
-      return false;
-    }
{code}
- 
Is there an off by 1 error here? Specifically. input.getInputIdentifier() + ((CompositeInputAttemptIdentifier) input).getInputIdentifierCount(). Think getInputIdentifierCount is always at least 1.
{code}
+      if(input instanceof CompositeInputAttemptIdentifier) {
+        // LLL Is there an off by 0 case here. i.e. getInputIdentiferCount will be set to 1 for a regular event.
+        // We start scanning the bitst early. Also depends on how completedInputSet is setup.
+        if (completedInputSet.nextClearBit(input.getInputIdentifier()) >=
+            input.getInputIdentifier() + ((CompositeInputAttemptIdentifier) input).getInputIdentifierCount()) {
+          inputIter.remove();
+          continue;
+        }
+      } else {
+        if (completedInputSet.get(input.getInputIdentifier())) {
+          inputIter.remove();
+          continue;
+        }
{code}


> ShuffleHandler completedInputSet off-by-one error
> -------------------------------------------------
>
>                 Key: TEZ-3685
>                 URL: https://issues.apache.org/jira/browse/TEZ-3685
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>
> Per comment in TEZ-3334 https://issues.apache.org/jira/browse/TEZ-3334?focusedCommentId=15950550&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15950550
> Addressing The off by 1 issue / other correctness issues in this jira
> - ShuffleManager: has moved to using a BitSet instead of a ConcurrentSet. May need to look at the synchronization points required for the bitset. (access to the map had some synchronized and some non synchronized access)
> - Why was this removed?
> {code}
> -    // Sanity check
> -    // we are guaranteed that key is not null
> -    if (srcAttemptsRemaining.get(srcAttemptId.toString()) == null) {
> -      // wrongMapErrs.increment(1);
> -      LOG.warn("Invalid input. Received output for headerPathComponent: "
> -          + pathComponent + "nextRemainingSrcAttemptId: "
> -          + getNextRemainingAttempt() + ", mappedSrcAttemptId: " + srcAttemptId);
> -      return false;
> -    }
> {code}
> - 
> Is there an off by 1 error here? Specifically. input.getInputIdentifier() + ((CompositeInputAttemptIdentifier) input).getInputIdentifierCount(). Think getInputIdentifierCount is always at least 1.
> {code}
> +      if(input instanceof CompositeInputAttemptIdentifier) {
> +        // LLL Is there an off by 0 case here. i.e. getInputIdentiferCount will be set to 1 for a regular event.
> +        // We start scanning the bitst early. Also depends on how completedInputSet is setup.
> +        if (completedInputSet.nextClearBit(input.getInputIdentifier()) >=
> +            input.getInputIdentifier() + ((CompositeInputAttemptIdentifier) input).getInputIdentifierCount()) {
> +          inputIter.remove();
> +          continue;
> +        }
> +      } else {
> +        if (completedInputSet.get(input.getInputIdentifier())) {
> +          inputIter.remove();
> +          continue;
> +        }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)