You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Matt Keranen (JIRA)" <ji...@apache.org> on 2016/01/27 20:25:39 UTC

[jira] [Created] (DRILL-4317) Exceptions on SELECT and CTAS with CSV data

Matt Keranen created DRILL-4317:
-----------------------------------

             Summary: Exceptions on SELECT and CTAS with CSV data
                 Key: DRILL-4317
                 URL: https://issues.apache.org/jira/browse/DRILL-4317
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Text & CSV
    Affects Versions: 1.4.0
         Environment: 4 node cluster, Hadoop 2.7.0, 14.04.1-Ubuntu
            Reporter: Matt Keranen


Selecting from a CSV file or running a CTAS into Parquet generates exceptions.

Source file is ~650MB, a table of 4 key columns followed by 39 numeric data columns, otherwise a fairly simple format. Example:

{noformat}
2015-10-17 00:00,f5e9v8u2,err,fr7,226020793,76.094,26307,226020793,76.094,26307,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2015-10-17 00:00,c3f9x5z2,err,mi1,1339159295,216.004,177690,1339159295,216.004,177690,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2015-10-17 00:00,r5z2f2i9,err,mi1,7159994629,39718.011,65793,6142021303,30687.811,64630,143777403,40.521,146,75503742,41.905,89,170771174,168.165,198,192565529,370.475,222,97577280,318.068,120,62631452,288.253,68,32371173,189.527,39,41712265,299.184,46,39046408,363.418,47,34182318,465.343,43,127834582,6485.341,145
2015-10-17 00:00,j9s6i8t2,err,fr7,20580443899,277445.055,67826,2814893469,85447.816,54275,2584757097,608.001,2044,1395571268,769.113,1051,3070616988,3000.005,2284,3413811671,6489.060,2569,1772235156,5806.214,1339,1097879284,5064.120,858,691884865,4035.397,511,672967845,4815.875,518,789163614,7306.684,599,813910495,10632.464,627,1462752147,143470.306,1151
{noformat}

A "SELECT from `/path/to/file.csv`" runs for 10's of minutes and eventually results in:

{noformat}
java.lang.IndexOutOfBoundsException: index: 547681, length: 1 (expected: range(0, 547681))
        at io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1134)
        at io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136)
        at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289)
        at io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:26)
        at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
        at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
        at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
        at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
        at org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:443)
        at org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:125)
        at org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:146)
        at org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:136)
        at org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:94)
        at org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148)
        at org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795)
        at org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179)
        at net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
        at org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420)
        at sqlline.Rows$Row.<init>(Rows.java:157)
        at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63)
        at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
        at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
        at sqlline.SqlLine.print(SqlLine.java:1593)
        at sqlline.Commands.execute(Commands.java:852)
        at sqlline.Commands.sql(Commands.java:751)
        at sqlline.SqlLine.dispatch(SqlLine.java:746)
        at sqlline.SqlLine.begin(SqlLine.java:621)
        at sqlline.SqlLine.start(SqlLine.java:375)
        at sqlline.SqlLine.main(SqlLine.java:268)
{noformat}

A CTAS on the same file with storage as Parquet results in:

{noformat}
Error: SYSTEM ERROR: IllegalArgumentException: length: -260 (expected: >= 0)

Fragment 1:2

[Error Id: 1807615e-4385-4f85-8402-5900aaa568e9 on es07:31010]

  (java.lang.IllegalArgumentException) length: -260 (expected: >= 0)
    io.netty.buffer.AbstractByteBuf.checkIndex():1131
    io.netty.buffer.PooledUnsafeDirectByteBuf.nioBuffer():344
    io.netty.buffer.WrappedByteBuf.nioBuffer():727
    io.netty.buffer.UnsafeDirectLittleEndian.nioBuffer():26
    io.netty.buffer.DrillBuf.nioBuffer():356
    org.apache.drill.exec.store.ParquetOutputRecordWriter$VarCharParquetConverter.writeField():1842
    org.apache.drill.exec.store.EventBasedRecordWriter.write():62
    org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():106
    org.apache.drill.exec.record.AbstractRecordBatch.next():162
    org.apache.drill.exec.physical.impl.BaseRootExec.next():104
    org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
    org.apache.drill.exec.physical.impl.BaseRootExec.next():94
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
    org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
    java.security.AccessController.doPrivileged():-2
    javax.security.auth.Subject.doAs():415
    org.apache.hadoop.security.UserGroupInformation.doAs():1657
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
    org.apache.drill.common.SelfCleaningRunnable.run():38
    java.util.concurrent.ThreadPoolExecutor.runWorker():1145
    java.util.concurrent.ThreadPoolExecutor$Worker.run():615
    java.lang.Thread.run():745 (state=,code=0)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)