You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Boaz Ben-Zvi (JIRA)" <ji...@apache.org> on 2016/09/02 19:01:20 UTC

[jira] [Commented] (DRILL-3898) No space error during external sort does not cancel the query

    [ https://issues.apache.org/jira/browse/DRILL-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459314#comment-15459314 ] 

Boaz Ben-Zvi commented on DRILL-3898:
-------------------------------------

   Testing further 1.8 in embedded - the NPE shows up when there is enough spill disk space (or no spill). however when the space is restricted, the "No space left on device" error comes up, but the NPE does not !!  And the table was not created. (log is attached).
Which implies that in the current code, the query cancellation does propagate correctly.
May need to test again on a cluster, with the full SF100 before closing this bug.

  

> No space error during external sort does not cancel the query
> -------------------------------------------------------------
>
>                 Key: DRILL-3898
>                 URL: https://issues.apache.org/jira/browse/DRILL-3898
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.2.0, 1.8.0
>            Reporter: Victoria Markman
>            Assignee: Boaz Ben-Zvi
>             Fix For: Future
>
>         Attachments: drillbit.log
>
>
> While verifying DRILL-3732 I ran into a new problem.
> I think drill somehow loses track of out of disk exception and does not cancel rest of the query, which results in NPE:
> Reproduction is the same as in DRILL-3732:
> {code}
> 0: jdbc:drill:schema=dfs> create table store_sales_20(ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, s_sold_date_sk, ss_promo_sk) partition by (ss_promo_sk) as
> . . . . . . . . . . . . >  select 
> . . . . . . . . . . . . >      case when columns[2] = '' then cast(null as varchar(100)) else cast(columns[2] as varchar(100)) end,
> . . . . . . . . . . . . >      case when columns[3] = '' then cast(null as varchar(100)) else cast(columns[3] as varchar(100)) end,
> . . . . . . . . . . . . >      case when columns[4] = '' then cast(null as varchar(100)) else cast(columns[4] as varchar(100)) end, 
> . . . . . . . . . . . . >      case when columns[5] = '' then cast(null as varchar(100)) else cast(columns[5] as varchar(100)) end, 
> . . . . . . . . . . . . >      case when columns[0] = '' then cast(null as varchar(100)) else cast(columns[0] as varchar(100)) end, 
> . . . . . . . . . . . . >      case when columns[8] = '' then cast(null as varchar(100)) else cast(columns[8] as varchar(100)) end
> . . . . . . . . . . . . >  from 
> . . . . . . . . . . . . >           `store_sales.dat` ss     
> . . . . . . . . . . . . > ;
> Error: SYSTEM ERROR: NullPointerException
> Fragment 1:16
> [Error Id: 0ae9338d-d04f-4b4a-93aa-a80d13cedb29 on atsqa4-133.qa.lab:31010] (state=,code=0)
> {code}
> This exception in drillbit.log should have triggered query cancellation:
> {code}
> 2015-10-06 17:01:34,463 [WorkManager-2] ERROR o.apache.drill.exec.work.WorkManager - org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
> org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>         at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.7.0_71]
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.7.0_71]
>         at java.io.FilterOutputStream.close(FilterOutputStream.java:157) ~[na:1.7.0_71]
>         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:400) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.drill.exec.physical.impl.xsort.BatchGroup.close(BatchGroup.java:152) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:44) ~[drill-common-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:553) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:362) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:91) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_71]
>         at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_71]
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:252) ~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) ~[drill-common-1.2.0.jar:1.2.0]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_71]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.7.0_71]
>         at java.io.FileOutputStream.write(FileOutputStream.java:345) ~[na:1.7.0_71]
>         at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224) ~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         ... 45 common frames omitted
> {code}
> I'm attaching full drillbit.log



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)