You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by vrozov <gi...@git.apache.org> on 2018/02/06 05:55:15 UTC

[GitHub] drill pull request #1113: DRILL-5902: Regression: Queries encounter random f...

GitHub user vrozov opened a pull request:

    https://github.com/apache/drill/pull/1113

    DRILL-5902: Regression: Queries encounter random failure due to RPC connection timed out

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vrozov/drill DRILL-5902

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1113.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1113
    
----
commit fe329c2517710bb9fdec273de24321717fc954e6
Author: Vlad Rozov <vr...@...>
Date:   2018-02-06T03:15:56Z

    DRILL-5902: Regression: Queries encounter random failure due to RPC connection timed out

----


---

[GitHub] drill pull request #1113: DRILL-5902: Regression: Queries encounter random f...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1113#discussion_r168699838
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStateProcessor.java ---
    @@ -125,20 +125,17 @@ public void cancel() {
           case PREPARING:
           case PLANNING:
           case ENQUEUED:
    -        moveToState(QueryState.CANCELLATION_REQUESTED, null);
    -        return;
    -
           case STARTING:
           case RUNNING:
    -        addToEventQueue(QueryState.CANCELLATION_REQUESTED, null);
    -        return;
    +        moveToState(QueryState.CANCELLATION_REQUESTED, null);
    --- End diff --
    
    1. Your point makes sense. In this case could you please update java doc for the `cancel` method to be consistent with new changes?
    2. Maybe we should remove word `regression` from the commit message to avoid confusion?


---

[GitHub] drill issue #1113: DRILL-5902: Regression: Queries encounter random failure ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on the issue:

    https://github.com/apache/drill/pull/1113
  
    @vrozov Can you provide a brief description of the issue? I have recently filed https://issues.apache.org/jira/browse/DRILL-6143 and I want to verify that these are two separate issues. 
    
    DRILL-6143 causes a premature timeout when fragments are sent to drillbits in the FragmentsRunner. The issue you fixed here seems to involve a timeout when a query is cancelled. So my initial guess is that these two issues are unrelated. Please let me know if they are related.


---

[GitHub] drill pull request #1113: DRILL-5902: Queries encounter random failure due t...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/drill/pull/1113


---

[GitHub] drill pull request #1113: DRILL-5902: Regression: Queries encounter random f...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1113#discussion_r168599100
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStateProcessor.java ---
    @@ -125,20 +125,17 @@ public void cancel() {
           case PREPARING:
           case PLANNING:
           case ENQUEUED:
    -        moveToState(QueryState.CANCELLATION_REQUESTED, null);
    -        return;
    -
           case STARTING:
           case RUNNING:
    -        addToEventQueue(QueryState.CANCELLATION_REQUESTED, null);
    -        return;
    +        moveToState(QueryState.CANCELLATION_REQUESTED, null);
    --- End diff --
    
    1. Per my understanding, only during a transition from `STARTING` to `RUNNING` it is necessary to delay the processing of `CANCELLATION_REQUESTED` till requests are submitted to remote nodes for execution. Once the state is transitioned to `RUNNING`, remote drillbits are ready to start processing cancellation request, so no delay is necessary for the `RUNNING` state. In case of `STARTING` there is already a call to `addToEventQueue()` inside `QueryStateProcessor.starting()` that ensures that cancellation will be processed after the transition.
    1. I can't say why JIRA title states that it is a regression, as far as I can tell, any delay in saving query profile may cause the issue. One possibility is that the issue is amplified by [DRILL-6053](https://issues.apache.org/jira/browse/DRILL-6053) that is a regression caused by the fix for [DRILL-4963](https://issues.apache.org/jira/browse/DRILL-4963) that introduces even longer delay (due to synchronization).


---

[GitHub] drill issue #1113: DRILL-5902: Regression: Queries encounter random failure ...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on the issue:

    https://github.com/apache/drill/pull/1113
  
    @arina-ielchiieva Please review


---

[GitHub] drill issue #1113: DRILL-5902: Queries encounter random failure due to RPC c...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/1113
  
    +1, LGTM.


---

[GitHub] drill pull request #1113: DRILL-5902: Regression: Queries encounter random f...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1113#discussion_r168415470
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStateProcessor.java ---
    @@ -125,20 +125,17 @@ public void cancel() {
           case PREPARING:
           case PLANNING:
           case ENQUEUED:
    -        moveToState(QueryState.CANCELLATION_REQUESTED, null);
    -        return;
    -
           case STARTING:
           case RUNNING:
    -        addToEventQueue(QueryState.CANCELLATION_REQUESTED, null);
    -        return;
    +        moveToState(QueryState.CANCELLATION_REQUESTED, null);
    --- End diff --
    
    1. Could you please explain why `addToEventQueue` was changed to `moveToState`. Per my understanding `addToEventQueue` is used to ensure that cancellation will be requested only when all fragments are sent out to avoid hanging in cancellation requested state. For preparing, planning and enqueued states we cancel immediately since these states are done locally.
    2. Why this Jira title states it's a regression? Do we know what has caused the regression?


---

[GitHub] drill issue #1113: DRILL-5902: Regression: Queries encounter random failure ...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on the issue:

    https://github.com/apache/drill/pull/1113
  
    @ilooner DRILL-6143 is not related to DRILL-5902. DRILL-6143 requires a separate RCA. See DRILL-5902 for details.


---

[GitHub] drill pull request #1113: DRILL-5902: Regression: Queries encounter random f...

Posted by vrozov <gi...@git.apache.org>.
Github user vrozov commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1113#discussion_r168935308
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/work/foreman/QueryStateProcessor.java ---
    @@ -125,20 +125,17 @@ public void cancel() {
           case PREPARING:
           case PLANNING:
           case ENQUEUED:
    -        moveToState(QueryState.CANCELLATION_REQUESTED, null);
    -        return;
    -
           case STARTING:
           case RUNNING:
    -        addToEventQueue(QueryState.CANCELLATION_REQUESTED, null);
    -        return;
    +        moveToState(QueryState.CANCELLATION_REQUESTED, null);
    --- End diff --
    
    @arina-ielchiieva Please update JIRA title.


---

[GitHub] drill issue #1113: DRILL-5902: Regression: Queries encounter random failure ...

Posted by ilooner <gi...@git.apache.org>.
Github user ilooner commented on the issue:

    https://github.com/apache/drill/pull/1113
  
    K thx @vrozov 


---