You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Arina Ielchiieva (Jira)" <ji...@apache.org> on 2019/10/11 10:35:00 UTC
[jira] [Commented] (DRILL-5451) Query on csv file w/ header fails with an exception when non existing column is requested if file is over 4096 lines long

    [ https://issues.apache.org/jira/browse/DRILL-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949353#comment-16949353 ] 

Arina Ielchiieva commented on DRILL-5451:
-----------------------------------------

[~Nevo] please file separate ticket for Parquet failure with reproduce data, of course if issue still persists.

> Query on csv file w/ header fails with an exception when non existing column is requested if file is over 4096 lines long
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5451
>                 URL: https://issues.apache.org/jira/browse/DRILL-5451
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text &amp; CSV
>    Affects Versions: 1.10.0
>         Environment: Tested on CentOs 7 and Ubuntu
>            Reporter: Paul Wilson
>            Assignee: Paul Rogers
>            Priority: Major
>         Attachments: 4097_lines.csvh
>
>
> When querying a text (csv) file with extractHeaders set to true, selecting a non existent column works as expected (returns "empty" value) when file has 4096 lines or fewer (1 header plus 4095 data), but results in an IndexOutOfBoundsException where the file has 4097 lines or more.
> With Storage config:
> {code:javascript}
> "csvh": {
>       "type": "text",
>       "extensions": [
>         "csvh"
>       ],
>       "extractHeader": true,
>       "delimiter": ","
>     }
> {code}
> In the following 4096_lines.csvh has is identical to 4097_lines.csvh with the last line removed.
> Results:
> {noformat}
> 0: jdbc:drill:zk=local> select * from dfs.`/test/4097_lines.csvh` LIMIT 2;
> +----------+------------------------+
> | line_no  |    line_description    |
> +----------+------------------------+
> | 2        | this is line number 2  |
> | 3        | this is line number 3  |
> +----------+------------------------+
> 2 rows selected (2.455 seconds)
> 0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4096_lines.csvh` LIMIT 2;
> +----------+---------------------+
> | line_no  | non_existent_field  |
> +----------+---------------------+
> | 2        |                     |
> | 3        |                     |
> +----------+---------------------+
> 2 rows selected (2.248 seconds)
> 0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4097_lines.csvh` LIMIT 2;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 (expected: range(0, 16384))
> Fragment 0:0
> [Error Id: eb0974a8-026d-4048-9f10-ffb821a0d300 on localhost:31010]
>   (java.lang.IndexOutOfBoundsException) index: 16384, length: 4 (expected: range(0, 16384))
>     io.netty.buffer.DrillBuf.checkIndexD():123
>     io.netty.buffer.DrillBuf.chk():147
>     io.netty.buffer.DrillBuf.getInt():520
>     org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358
>     org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount():659
>     org.apache.drill.exec.physical.impl.ScanBatch.next():234
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1657
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>     java.lang.Thread.run():745 (state=,code=0)
> 0: jdbc:drill:zk=local> 
> {noformat}
> This seems similar to the issue fixed in [DRILL-4108|https://issues.apache.org/jira/browse/DRILL-4108] but it now only manifests for longer files.
> I also see a similar result (i.e. it works for <= 4096 lines, IOBE for >4096 lines) for a {noformat} SELECT count(*) ...{noformat} from these files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)