You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Paul Wilson (JIRA)" <ji...@apache.org> on 2017/04/28 10:12:04 UTC
[jira] [Created] (DRILL-5451) Query on csv file w/ header fails
with an exception when non existing column is requested if file is over
4096 lines long
Paul Wilson created DRILL-5451:
----------------------------------
Summary: Query on csv file w/ header fails with an exception when non existing column is requested if file is over 4096 lines long
Key: DRILL-5451
URL: https://issues.apache.org/jira/browse/DRILL-5451
Project: Apache Drill
Issue Type: Bug
Components: Storage - Text & CSV
Affects Versions: 1.10.0
Environment: Tested on CentOs 7 and Ubuntu
Reporter: Paul Wilson
When querying a text (csv) file with extractHeaders set to true, selecting a non existent column works as expected (returns "empty" value) when file has 4096 lines or fewer (1 header plus 4095 data), but results in an IndexOutOfBoundsException where the file has 4097 lines or more.
With Storage config:
{code:javascript}
"csvh": {
"type": "text",
"extensions": [
"csvh"
],
"extractHeader": true,
"delimiter": ","
}
{code}
In the following 4096_lines.csvh has is identical to 4097_lines.csvh with the last line removed.
Results:
{noformat}
0: jdbc:drill:zk=local> select * from dfs.`/test/4097_lines.csvh` LIMIT 2;
+----------+------------------------+
| line_no | line_description |
+----------+------------------------+
| 2 | this is line number 2 |
| 3 | this is line number 3 |
+----------+------------------------+
2 rows selected (2.455 seconds)
0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4096_lines.csvh` LIMIT 2;
+----------+---------------------+
| line_no | non_existent_field |
+----------+---------------------+
| 2 | |
| 3 | |
+----------+---------------------+
2 rows selected (2.248 seconds)
0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4097_lines.csvh` LIMIT 2;
Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 (expected: range(0, 16384))
Fragment 0:0
[Error Id: eb0974a8-026d-4048-9f10-ffb821a0d300 on localhost:31010]
(java.lang.IndexOutOfBoundsException) index: 16384, length: 4 (expected: range(0, 16384))
io.netty.buffer.DrillBuf.checkIndexD():123
io.netty.buffer.DrillBuf.chk():147
io.netty.buffer.DrillBuf.getInt():520
org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358
org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount():659
org.apache.drill.exec.physical.impl.ScanBatch.next():234
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.record.AbstractRecordBatch.next():119
org.apache.drill.exec.record.AbstractRecordBatch.next():109
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1657
org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745 (state=,code=0)
0: jdbc:drill:zk=local>
{noformat}
This seems similar to the issue fixed in [DRILL-4108|https://issues.apache.org/jira/browse/DRILL-4108] but it now only manifests for longer files.
I also see a similar result (i.e. it works for <= 4096 lines, IOBE for >4096 lines) for a {noformat} SELECT count(*) ...{noformat} from these files.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)