You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/17 17:08:31 UTC

[GitHub] [iceberg] grantatspothero edited a comment on issue #104: ManifestReader is not properly closed in BaseTableScan

grantatspothero edited a comment on issue #104:
URL: https://github.com/apache/iceberg/issues/104#issuecomment-801254786


   I'm seeing this as well with iceberg 0.11.0 in trino when trying to query a table:
   ```
   2021-03-17T11:38:29.984-0500	WARN	Finalizer	org.apache.iceberg.hadoop.HadoopStreams	Unclosed input stream created by:
   	org.apache.iceberg.hadoop.HadoopStreams$HadoopSeekableInputStream.<init>(HadoopStreams.java:80)
   	org.apache.iceberg.hadoop.HadoopStreams.wrap(HadoopStreams.java:55)
   	org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:157)
   	io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23)
   	io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:96)
   	io.trino.plugin.iceberg.HdfsInputFile.newStream(HdfsInputFile.java:60)
   	org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:95)
   	org.apache.iceberg.avro.AvroIterable.getMetadata(AvroIterable.java:66)
   	org.apache.iceberg.ManifestReader.<init>(ManifestReader.java:101)
   	org.apache.iceberg.ManifestFiles.read(ManifestFiles.java:63)
   	org.apache.iceberg.ManifestGroup.lambda$entries$13(ManifestGroup.java:220)
   	org.apache.iceberg.relocated.com.google.common.collect.Iterators$6.transform(Iterators.java:783)
   	org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
   	org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
   	org.apache.iceberg.util.ParallelIterable$ParallelIterator.submitNextTask(ParallelIterable.java:114)
   	org.apache.iceberg.util.ParallelIterable$ParallelIterator.checkTasks(ParallelIterable.java:101)
   	org.apache.iceberg.util.ParallelIterable$ParallelIterator.hasNext(ParallelIterable.java:138)
   	org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:42)
   	org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:42)
   	org.apache.iceberg.relocated.com.google.common.collect.Iterators$ConcatenatedIterator.getTopMetaIterator(Iterators.java:1309)
   	org.apache.iceberg.relocated.com.google.common.collect.Iterators$ConcatenatedIterator.hasNext(Iterators.java:1325)
   	org.apache.iceberg.io.CloseableIterator$1.hasNext(CloseableIterator.java:48)
   	org.apache.iceberg.util.BinPacking$PackingIterator.next(BinPacking.java:111)
   	org.apache.iceberg.util.BinPacking$PackingIterator.next(BinPacking.java:87)
   	org.apache.iceberg.io.CloseableIterator$1.next(CloseableIterator.java:53)
   	org.apache.iceberg.io.CloseableIterable$3$1.next(CloseableIterable.java:109)
   	java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
   	java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
   	java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
   	java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
   	java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
   	java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
   	com.google.common.collect.Iterators$7.hasNext(Iterators.java:912)
   	io.trino.plugin.iceberg.IcebergSplitSource.getNextBatch(IcebergSplitSource.java:60)
   	io.trino.plugin.base.classloader.ClassLoaderSafeConnectorSplitSource.getNextBatch(ClassLoaderSafeConnectorSplitSource.java:44)
   	io.trino.split.ConnectorAwareSplitSource.getNextBatch(ConnectorAwareSplitSource.java:54)
   	io.trino.split.BufferingSplitSource$GetNextBatch.fetchSplits(BufferingSplitSource.java:113)
   	io.trino.split.BufferingSplitSource$GetNextBatch.fetchNextBatchAsync(BufferingSplitSource.java:94)
   	io.trino.split.BufferingSplitSource.getNextBatch(BufferingSplitSource.java:54)
   	io.trino.execution.scheduler.SourcePartitionedScheduler.schedule(SourcePartitionedScheduler.java:255)
   	io.trino.execution.scheduler.SourcePartitionedScheduler$1.schedule(SourcePartitionedScheduler.java:166)
   	io.trino.execution.scheduler.SqlQueryScheduler.schedule(SqlQueryScheduler.java:550)
   	java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
   	java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
   	java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   	java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   	java.base/java.lang.Thread.run(Thread.java:835)
   ```
   
   I think this is the cause of the query failure with an error like this:
   ```
   java.lang.IndexOutOfBoundsException: undefined
   	at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:344)
   	at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
   	at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
   	at org.apache.iceberg.hadoop.HadoopStreams$HadoopSeekableInputStream.read(HadoopStreams.java:112)
   	at org.apache.iceberg.avro.AvroIO$AvroInputStreamAdapter.read(AvroIO.java:106)
   	at org.apache.avro.file.DataFileReader.openReader(DataFileReader.java:61)
   	at org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:94)
   	at org.apache.iceberg.avro.AvroIterable.getMetadata(AvroIterable.java:66)
   	at org.apache.iceberg.ManifestReader.<init>(ManifestReader.java:101)
   	at org.apache.iceberg.ManifestFiles.read(ManifestFiles.java:63)
   	at org.apache.iceberg.ManifestGroup.lambda$entries$13(ManifestGroup.java:220)
   	at org.apache.iceberg.relocated.com.google.common.collect.Iterators$6.transform(Iterators.java:783)
   	at org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
   	at org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
   	at org.apache.iceberg.util.ParallelIterable$ParallelIterator.submitNextTask(ParallelIterable.java:114)
   	at org.apache.iceberg.util.ParallelIterable$ParallelIterator.checkTasks(ParallelIterable.java:101)
   	at org.apache.iceberg.util.ParallelIterable$ParallelIterator.hasNext(ParallelIterable.java:138)
   	at org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:42)
   	at org.apache.iceberg.relocated.com.google.common.collect.TransformedIterator.hasNext(TransformedIterator.java:42)
   	at org.apache.iceberg.relocated.com.google.common.collect.Iterators$ConcatenatedIterator.getTopMetaIterator(Iterators.java:1309)
   	at org.apache.iceberg.relocated.com.google.common.collect.Iterators$ConcatenatedIterator.hasNext(Iterators.java:1325)
   	at org.apache.iceberg.io.CloseableIterator$1.hasNext(CloseableIterator.java:48)
   	at org.apache.iceberg.util.BinPacking$PackingIterator.next(BinPacking.java:111)
   	at org.apache.iceberg.util.BinPacking$PackingIterator.next(BinPacking.java:87)
   	at org.apache.iceberg.io.CloseableIterator$1.next(CloseableIterator.java:53)
   	at org.apache.iceberg.io.CloseableIterable$3$1.next(CloseableIterable.java:109)
   	at java.base/java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
   	at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:294)
   	at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
   	at java.base/java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
   	at java.base/java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
   	at java.base/java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
   	at com.google.common.collect.Iterators$7.hasNext(Iterators.java:912)
   	at io.trino.plugin.iceberg.IcebergSplitSource.getNextBatch(IcebergSplitSource.java:60)
   	at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorSplitSource.getNextBatch(ClassLoaderSafeConnectorSplitSource.java:44)
   	at io.trino.split.ConnectorAwareSplitSource.getNextBatch(ConnectorAwareSplitSource.java:54)
   	at io.trino.split.BufferingSplitSource$GetNextBatch.fetchSplits(BufferingSplitSource.java:113)
   	at io.trino.split.BufferingSplitSource$GetNextBatch.fetchNextBatchAsync(BufferingSplitSource.java:94)
   	at io.trino.split.BufferingSplitSource.getNextBatch(BufferingSplitSource.java:54)
   	at io.trino.execution.scheduler.SourcePartitionedScheduler.schedule(SourcePartitionedScheduler.java:255)
   	at io.trino.execution.scheduler.SourcePartitionedScheduler$1.schedule(SourcePartitionedScheduler.java:166)
   	at io.trino.execution.scheduler.SqlQueryScheduler.schedule(SqlQueryScheduler.java:550)
   	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
   	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   	at java.base/java.lang.Thread.run(Thread.java:835)
   ``` 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org