You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@asterixdb.apache.org by "Ingo Müller (Jira)" <ji...@apache.org> on 2021/06/17 13:02:00 UTC

[jira] [Created] (ASTERIXDB-2918) IndexOutOfBoundsException when querying Parquet files

Ingo Müller created ASTERIXDB-2918:
--------------------------------------

             Summary: IndexOutOfBoundsException when querying Parquet files
                 Key: ASTERIXDB-2918
                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2918
             Project: Apache AsterixDB
          Issue Type: Bug
          Components: EXT - External data
            Reporter: Ingo Müller
         Attachments: Run2012B_SingleMu_restructured_1000.parquet, create_event_type.sqlpp

I am getting an IndexOutOfBoundsException when creating an external table based on Parquet files onHDFS or loading them into an existing table if I specify a closed type for the table. If a specify an empty open type as follows, all works fine:

{{CREATE TYPE anyType IF NOT EXISTS AS OPEN {};}}

Then I create an external table as follows:

{{CREATE EXTERNAL DATASET untypedDataset(anyType)}}
{{USING hdfs}}
{{ (("hdfs"="hdfs://namenode:8020"),}}
{{ ("path"="/test/*.parquet"),}}
{{ ("input-format"="parquet-input-format"))}}

With {{anyType}}, I can query the table just fine. However, if I use the {{eventType}} created as shown in the attachment, running any query against the dataset produces an error about an exception. In cc.log, I find the following output:

{{12:51:59.457 [HttpExecutor(port:19002)-4] WARN org.apache.asterix.api.http.server.QueryServiceServlet - handleException: unexpected exception: \{"host":"localhost:19002","path":"/query/service","statement":"SELECT * FROM Run2012B_SingleMu_1000_typed_external_parquet","pretty":false,"mode":"immediate","clientContextID":"80","format":"CLEAN_JSON","timeout":9223372036854775807,"maxResultReads":1,"planFormat":"JSON","expressionTree":false,"rewrittenExpressionTree":false,"logicalPlan":true,"optimizedLogicalPlan":true,"job":false,"profile":"counts","signature":true,"multiStatement":true,"parseOnly":false,"readOnly":false,"maxWarnings":9007199254740991}}}
{{org.apache.hyracks.api.exceptions.HyracksDataException: java.lang.IllegalStateException: java.lang.IllegalStateException: java.lang.IndexOutOfBoundsException}}
{{ at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:70) ~[hyracks-api-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.api.util.ExceptionUtils.setNodeIds(ExceptionUtils.java:70) ~[hyracks-api-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.control.nc.Task.run(Task.java:390) ~[hyracks-control-nc-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]}}
{{ at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]}}
{{ at java.lang.Thread.run(Unknown Source) [?:?]}}
{{Caused by: java.lang.IllegalStateException: java.lang.IllegalStateException: java.lang.IndexOutOfBoundsException}}
{{ at org.apache.asterix.om.pointables.ARecordVisitablePointable.set(ARecordVisitablePointable.java:272) ~[asterix-om-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.dataflow.data.nontagged.printers.json.clean.ARecordPrinterFactory$1.print(ARecordPrinterFactory.java:61) ~[asterix-om-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.writers.PrinterBasedWriterFactory$1.printTuple(PrinterBasedWriterFactory.java:66) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.serializer.ResultSerializerFactoryProvider$1$1.appendTuple(ResultSerializerFactoryProvider.java:64) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.std.result.ResultWriterOperatorDescriptor$1.nextFrame(ResultWriterOperatorDescriptor.java:105) ~[hyracks-dataflow-std-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:94) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendProjectionToWriter(FrameUtils.java:264) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendProjectionToFrame(AbstractOneInputOneOutputOneFramePushRuntime.java:104) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendProjectionToFrame(AbstractOneInputOneOutputOneFramePushRuntime.java:99) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.std.StreamProjectRuntimeFactory$1.nextFrame(StreamProjectRuntimeFactory.java:74) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:94) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:185) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:91) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:87) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:132) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$1.nextFrame(AlgebricksMetaOperatorDescriptor.java:155) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:94) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.util.DataflowUtils.addTupleToFrame(DataflowUtils.java:37) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.dataflow.TupleForwarder.addTuple(TupleForwarder.java:43) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.dataflow.RecordDataFlowController.start(RecordDataFlowController.java:71) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.dataset.adapter.GenericAdapter.start(GenericAdapter.java:38) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.operators.ExternalScanOperatorDescriptor$1.initialize(ExternalScanOperatorDescriptor.java:82) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$0(SuperActivityOperatorNodePushable.java:227) ~[hyracks-api-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]}}
{{ ... 3 more}}
{{Caused by: java.lang.IllegalStateException: java.lang.IndexOutOfBoundsException}}
{{ at org.apache.asterix.om.pointables.ARecordVisitablePointable.set(ARecordVisitablePointable.java:272) ~[asterix-om-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.om.pointables.ARecordVisitablePointable.set(ARecordVisitablePointable.java:234) ~[asterix-om-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.dataflow.data.nontagged.printers.json.clean.ARecordPrinterFactory$1.print(ARecordPrinterFactory.java:61) ~[asterix-om-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.writers.PrinterBasedWriterFactory$1.printTuple(PrinterBasedWriterFactory.java:66) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.serializer.ResultSerializerFactoryProvider$1$1.appendTuple(ResultSerializerFactoryProvider.java:64) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.std.result.ResultWriterOperatorDescriptor$1.nextFrame(ResultWriterOperatorDescriptor.java:105) ~[hyracks-dataflow-std-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:94) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendProjectionToWriter(FrameUtils.java:264) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendProjectionToFrame(AbstractOneInputOneOutputOneFramePushRuntime.java:104) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendProjectionToFrame(AbstractOneInputOneOutputOneFramePushRuntime.java:99) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.std.StreamProjectRuntimeFactory$1.nextFrame(StreamProjectRuntimeFactory.java:74) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:94) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:185) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:91) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:87) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:132) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$1.nextFrame(AlgebricksMetaOperatorDescriptor.java:155) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:94) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.util.DataflowUtils.addTupleToFrame(DataflowUtils.java:37) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.dataflow.TupleForwarder.addTuple(TupleForwarder.java:43) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.dataflow.RecordDataFlowController.start(RecordDataFlowController.java:71) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.dataset.adapter.GenericAdapter.start(GenericAdapter.java:38) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.operators.ExternalScanOperatorDescriptor$1.initialize(ExternalScanOperatorDescriptor.java:82) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$0(SuperActivityOperatorNodePushable.java:227) ~[hyracks-api-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]}}
{{ ... 3 more}}
{{Caused by: java.lang.IndexOutOfBoundsException}}
{{ at org.apache.hyracks.data.std.util.ByteArrayAccessibleOutputStream.write(ByteArrayAccessibleOutputStream.java:75) ~[hyracks-data-std-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at java.io.DataOutputStream.write(Unknown Source) ~[?:?]}}
{{ at org.apache.asterix.om.pointables.ARecordVisitablePointable.set(ARecordVisitablePointable.java:231) ~[asterix-om-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.om.pointables.ARecordVisitablePointable.set(ARecordVisitablePointable.java:234) ~[asterix-om-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.dataflow.data.nontagged.printers.json.clean.ARecordPrinterFactory$1.print(ARecordPrinterFactory.java:61) ~[asterix-om-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.writers.PrinterBasedWriterFactory$1.printTuple(PrinterBasedWriterFactory.java:66) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.serializer.ResultSerializerFactoryProvider$1$1.appendTuple(ResultSerializerFactoryProvider.java:64) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.std.result.ResultWriterOperatorDescriptor$1.nextFrame(ResultWriterOperatorDescriptor.java:105) ~[hyracks-dataflow-std-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:94) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendProjectionToWriter(FrameUtils.java:264) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendProjectionToFrame(AbstractOneInputOneOutputOneFramePushRuntime.java:104) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendProjectionToFrame(AbstractOneInputOneOutputOneFramePushRuntime.java:99) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.std.StreamProjectRuntimeFactory$1.nextFrame(StreamProjectRuntimeFactory.java:74) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:94) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:185) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:91) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:87) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:132) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$1.nextFrame(AlgebricksMetaOperatorDescriptor.java:155) ~[algebricks-runtime-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:94) ~[hyracks-dataflow-common-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.util.DataflowUtils.addTupleToFrame(DataflowUtils.java:37) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.dataflow.TupleForwarder.addTuple(TupleForwarder.java:43) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.dataflow.RecordDataFlowController.start(RecordDataFlowController.java:71) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.dataset.adapter.GenericAdapter.start(GenericAdapter.java:38) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.asterix.external.operators.ExternalScanOperatorDescriptor$1.initialize(ExternalScanOperatorDescriptor.java:82) ~[asterix-external-data-0.9.7-SNAPSHOT.jar:0.9.7-SNAPSHOT]}}
{{ at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$0(SuperActivityOperatorNodePushable.java:227) ~[hyracks-api-0.3.7-SNAPSHOT.jar:0.3.7-SNAPSHOT]}}
{{ at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]}}
{{ ... 3 more}}

I do not know how to debug this further.

For your reference, I am using a self-compiled development from master from a few days ago (rev. 5120106e) running on AdoptOpenJDK 15. I am also attaching the Parquet file that caused the problem.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)