You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Abhishek Girish (JIRA)" <ji...@apache.org> on 2017/02/28 21:58:45 UTC

[jira] [Updated] (DRILL-5304) Queries fail intermittently when there is skew in data distribution

     [ https://issues.apache.org/jira/browse/DRILL-5304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Abhishek Girish updated DRILL-5304:
-----------------------------------
    Description: 
In a distributed environment, we've observed certain queries to fail execution intermittently, with an assignment logic issue, when the underlying data is skewed w.r.t distribution. 

For example the TPC-H [query 7|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Advanced/tpch/tpch_sf100/parquet/07.q] failed with the below error:
{code}
java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: MinorFragmentId 105 has no read entries assigned
...
  (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception during fragment initialization: MinorFragmentId 105 has no read entries assigned
    org.apache.drill.exec.work.foreman.Foreman.run():281
    java.util.concurrent.ThreadPoolExecutor.runWorker():1145
    java.util.concurrent.ThreadPoolExecutor$Worker.run():615
    java.lang.Thread.run():744
  Caused By (java.lang.IllegalArgumentException) MinorFragmentId 105 has no read entries assigned
{code}

Log containing full stack trace is attached.

And for this query, the underlying TPC-H SF100 Parquet dataset was observed to be located mostly only on 2-3 nodes on an 8 node DFS environment. The data distribution skew on this cluster is most likely the triggering factor for this case, as the same query, on the same dataset does not show this failure on a different test cluster (with possibly different data distribution). 

Also, another [query|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/limit0/window_functions/bugs/data/drill-3700.sql] failed with a similar error when slice target was set to 1. 
{code}
Failed with exception
java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: MinorFragmentId 66 has no read entries assigned
...
  (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception during fragment initialization: MinorFragmentId 66 has no read entries assigned
{code}


  was:
In a distributed environment, we've observed certain queries to fail execution intermittently, with an assignment logic issue, when the underlying data is skewed w.r.t distribution. 

For example the TPC-H query 7 failed with the below error:
{code}
java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: MinorFragmentId 105 has no read entries assigned
...
  (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception during fragment initialization: MinorFragmentId 105 has no read entries assigned
    org.apache.drill.exec.work.foreman.Foreman.run():281
    java.util.concurrent.ThreadPoolExecutor.runWorker():1145
    java.util.concurrent.ThreadPoolExecutor$Worker.run():615
    java.lang.Thread.run():744
  Caused By (java.lang.IllegalArgumentException) MinorFragmentId 105 has no read entries assigned
{code}

Log containing full stack trace is attached.

And for this query, the underlying TPC-H SF100 Parquet dataset was observed to be located mostly only on 2-3 nodes on an 8 node DFS environment. The data distribution skew on this cluster is most likely the triggering factor for this case, as the same query, on the same dataset does not show this failure on a different test cluster (with possibly different data distribution). 

Also, another [query](https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/limit0/window_functions/bugs/data/drill-3700.sql) failed with a similar error when slice target was set to 1. 
{code}
Failed with exception
java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: MinorFragmentId 66 has no read entries assigned
...
  (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception during fragment initialization: MinorFragmentId 66 has no read entries assigned
{code}



> Queries fail intermittently when there is skew in data distribution
> -------------------------------------------------------------------
>
>                 Key: DRILL-5304
>                 URL: https://issues.apache.org/jira/browse/DRILL-5304
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.10.0
>            Reporter: Abhishek Girish
>            Assignee: Padma Penumarthy
>
> In a distributed environment, we've observed certain queries to fail execution intermittently, with an assignment logic issue, when the underlying data is skewed w.r.t distribution. 
> For example the TPC-H [query 7|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Advanced/tpch/tpch_sf100/parquet/07.q] failed with the below error:
> {code}
> java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: MinorFragmentId 105 has no read entries assigned
> ...
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception during fragment initialization: MinorFragmentId 105 has no read entries assigned
>     org.apache.drill.exec.work.foreman.Foreman.run():281
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():744
>   Caused By (java.lang.IllegalArgumentException) MinorFragmentId 105 has no read entries assigned
> {code}
> Log containing full stack trace is attached.
> And for this query, the underlying TPC-H SF100 Parquet dataset was observed to be located mostly only on 2-3 nodes on an 8 node DFS environment. The data distribution skew on this cluster is most likely the triggering factor for this case, as the same query, on the same dataset does not show this failure on a different test cluster (with possibly different data distribution). 
> Also, another [query|https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/limit0/window_functions/bugs/data/drill-3700.sql] failed with a similar error when slice target was set to 1. 
> {code}
> Failed with exception
> java.sql.SQLException: SYSTEM ERROR: IllegalArgumentException: MinorFragmentId 66 has no read entries assigned
> ...
>   (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception during fragment initialization: MinorFragmentId 66 has no read entries assigned
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)