You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jacques Nadeau (JIRA)" <ji...@apache.org> on 2015/05/16 19:19:59 UTC

[jira] [Commented] (DRILL-3118) "java.lang.IndexOutOfBoundsException" if the source data has a "dir0" column

    [ https://issues.apache.org/jira/browse/DRILL-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14546865#comment-14546865 ] 

Jacques Nadeau commented on DRILL-3118:
---------------------------------------

You can set drill.exec.storage.file.partition.column.label as a SESSION option and that should override for just your session.  Does that work for this usecase or are you having problems with that as well?

> "java.lang.IndexOutOfBoundsException" if the source data has a "dir0" column
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-3118
>                 URL: https://issues.apache.org/jira/browse/DRILL-3118
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.0.0
>            Reporter: Hao Zhu
>            Assignee: Chris Westin
>
> Tested on 1.0 with commit id:
> {code}
> select commit_id from sys.version;
> +-------------------------------------------+
> |                 commit_id                 |
> +-------------------------------------------+
> | d8b19759657698581cc0d01d7038797952888123  |
> +-------------------------------------------+
> 1 row selected (0.097 seconds)
> {code}
> When source data has column name like "dir0","dir1"...., the query may fail with "java.lang.IndexOutOfBoundsException".
> For example:
> {code}
> > select `dir999` from dfs.root.`user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet`;
> Error: SYSTEM ERROR: java.lang.IndexOutOfBoundsException: index: 0, length: 4 (expected: range(0, 0))
> Fragment 0:0
> [Error Id: d289b3d7-1172-4ed7-b679-7af80d9aca7c on h1.poc.com:31010]
>   (org.apache.drill.common.exceptions.DrillRuntimeException) Error in parquet record reader.
> Message:
> Hadoop path: /user/hive/warehouse/testdir999/3d49fc1fd0bc7e81-e6c5bb9affac8684_358897896_data.parquet
> Total records read: 0
> Mock records read: 0
> Records to read: 32768
> Row group index: 0
> Records in row group: 1
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {
>   optional int32 id;
>   optional binary dir999;
> }
> , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32  [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY  [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]}
>     org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339
>     org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441
>     org.apache.drill.exec.physical.impl.ScanBatch.next():175
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():83
>     org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():73
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
>     java.security.AccessController.doPrivileged():-2
>   optional int32 id;
>   optional binary dir999;
> }
> , metadata: {}}, blocks: [BlockMetaData{1, 98 [ColumnMetaData{SNAPPY [id] INT32  [PLAIN, RLE, PLAIN_DICTIONARY], 23}, ColumnMetaData{SNAPPY [dir999] BINARY  [PLAIN, RLE, PLAIN_DICTIONARY], 103}]}]}
>     org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise():339
>     org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():441
>     org.apache.drill.exec.physical.impl.ScanBatch.next():175
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():83
>     org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():73
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1469
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():253
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>     java.lang.Thread.run():745
>   Caused By (java.lang.IndexOutOfBoundsException) index: 0, length: 4 (expected: range(0, 0))
>     io.netty.buffer.DrillBuf.checkIndexD():189
>     io.netty.buffer.DrillBuf.chk():211
>     io.netty.buffer.DrillBuf.getInt():491
>     org.apache.drill.exec.vector.UInt4Vector$Accessor.get():321
>     org.apache.drill.exec.vector.VarBinaryVector$Mutator.setSafe():481
>     org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.fillEmpties():408
>     org.apache.drill.exec.vector.NullableVarBinaryVector$Mutator.setValueCount():513
>     org.apache.drill.exec.store.parquet.columnreaders.VarLenBinaryReader.readFields():78
>     org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next():425
>     org.apache.drill.exec.physical.impl.ScanBatch.next():175
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():83
>     org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():80
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():73
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():259
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():253
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1469
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():253
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>     java.lang.Thread.run():745 (state=,code=0)
> {code}
> My thought:
> We need to fix this by 
> 1. Either prompting a readable message saying "dirN" is a reserved column names, please change drill.exec.storage.file.partition.column.label to something else;
> 2. Or/And if source data has dirN columns, it should override our reserved "dirN".
> 3. We need to document "drill.exec.storage.file.partition.column.label" in http://drill.apache.org/docs/querying-directories/
> 4. drill.exec.storage.file.partition.column.label is a system level configuration, if we use it as a workaround, it will impact the whole system. Can we make it a session level?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)