You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Abhishek Girish (JIRA)" <ji...@apache.org> on 2015/04/29 23:29:07 UTC

[jira] [Updated] (DRILL-2911) Queries fail with connection error when some Drillbit processes are down

     [ https://issues.apache.org/jira/browse/DRILL-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Abhishek Girish updated DRILL-2911:
-----------------------------------
    Attachment: drillbit_node4.log
                drillbit_node3.log
                drillbit_node2.log
                drillbit_node1.log

Git.Commit.ID: f5b0f49 (Apr 29 2015)

> Queries fail with connection error when some Drillbit processes are down
> ------------------------------------------------------------------------
>
>                 Key: DRILL-2911
>                 URL: https://issues.apache.org/jira/browse/DRILL-2911
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 0.9.0
>            Reporter: Abhishek Girish
>            Assignee: Chris Westin
>         Attachments: drillbit_node1.log, drillbit_node2.log, drillbit_node3.log, drillbit_node4.log
>
>
> Drill fails with connection error even when the Drill web UI also shows all drill-bits to be up. However, some nodes do not list the Drillbit process. Looks like an inconsistent state. 
> Queries with simple scans execute successfully:
> {code:sql}
> select i_item_sk from item limit 5;
> +------------+
> | i_item_sk  |
> +------------+
> | 1          |
> | 2          |
> | 3          |
> | 4          |
> | 5          |
> +------------+
> 5 rows selected (0.112 seconds)
> {code}
> Any query which might span across multiple drill-bits fails with connection error:
> {code:sql}
> SELECT 
> * 
> FROM     item i, 
>                 inventory inv
> WHERE        inv.inv_item_sk = i.i_item_sk 
> LIMIT 10;
> Query failed: CONNECTION ERROR: Exceeded timeout while waiting send intermediate work fragments to remote nodes.  Sent 4 and only heard response back from 3 nodes.
> [5ada1a3e-d198-478b-941d-3c9bb917e494 on abhi7.qa.lab:31010]
> Error: exception while executing query: Failure while executing query. (state=,code=0)
> {code}
> The issue could possibly be due to a previous failed query.  
> Couldn't find the error code in logs. Have attached logs from all nodes for reference. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)