You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/24 20:41:47 UTC

[GitHub] [iceberg] dubeme opened a new issue #2631: UnsupportedOperationException: Byte-buffer read unsupported by input stream

dubeme opened a new issue #2631:
URL: https://github.com/apache/iceberg/issues/2631


   Hi,
   
   Please can someone help me with this issue. I successfully wrote an iceberg table called `cat.table`. But when I try querying I get this errors `UnsupportedOperationException: Byte-buffer read unsupported by input stream`. Here is more details:
   
   ### Setup
   
   * Databricks - Runtime 7.6 ML
   * Spark - Version 3.0.1
   * Iceberg - Iceberg-spark3-runtime_0.11.1
   * HDFS - Azure Data Lake
   
   ### Schema
   Column name| Type
   -|-
   col_00 | bigint
   col_01 | bigint
   col_02 | string
   col_03 | double
   col_04 | double
   col_05 | double
   col_06 | decimal(28,8)
   col_07 | decimal(28,8)
   col_08 | map<bigint,decimal(28,8)>
   col_09 | timestamp
   col_10 | timestamp
   
   ### Query that runs
   ```sql
   SELECT * FROM cat.table
   ```
   
   ### Queries that throw the exception
   ```sql
   
   -- Query 1
   SELECT * FROM cat.table WHERE col_09 > date_sub(current_timestamp(), 30) 
   
   -- Query 2
   SELECT col_00, COUNT(*) AS count
   FROM cat.table
   WHERE col_09 > date_sub(current_timestamp(), 30) 
   GROUP BY col_00
   ORDER BY col_00
   
   ```
   
   ### Stack dump
   
   
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 32.0 failed 4 times, most recent failure: Lost task 0.3 in stage 32.0 (TID 1454, <redacted ip address>, executor 6): java.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream
   	at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:146)
   	at com.databricks.spark.metrics.FSInputStreamWithMetrics.$anonfun$read$1(FileSystemWithMetrics.scala:198)
   	at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
   	at com.databricks.spark.metrics.ExtendedTaskIOMetrics.withTimeMetric(FileSystemWithMetrics.scala:151)
   	at com.databricks.spark.metrics.ExtendedTaskIOMetrics.withTimeAndBytesMetric(FileSystemWithMetrics.scala:171)
   	at com.databricks.spark.metrics.ExtendedTaskIOMetrics.$anonfun$withTimeAndBytesReadMetric$1(FileSystemWithMetrics.scala:185)
   	at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
   	at com.databricks.spark.metrics.SamplerWithPeriod.sample(FileSystemWithMetrics.scala:78)
   	at com.databricks.spark.metrics.ExtendedTaskIOMetrics.withTimeAndBytesReadMetric(FileSystemWithMetrics.scala:185)
   	at com.databricks.spark.metrics.ExtendedTaskIOMetrics.withTimeAndBytesReadMetric$(FileSystemWithMetrics.scala:184)
   	at com.databricks.spark.metrics.FSInputStreamWithMetrics.withTimeAndBytesReadMetric(FileSystemWithMetrics.scala:192)
   	at com.databricks.spark.metrics.FSInputStreamWithMetrics.read(FileSystemWithMetrics.scala:198)
   	at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143)
   	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.util.H2SeekableInputStream$H2Reader.read(H2SeekableInputStream.java:81)
   	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:90)
   	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:75)
   	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:542)
   	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:712)
   	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:609)
   	at org.apache.iceberg.parquet.ReadConf.newReader(ReadConf.java:218)
   	at org.apache.iceberg.parquet.ReadConf.<init>(ReadConf.java:74)
   	at org.apache.iceberg.parquet.ParquetReader.init(ParquetReader.java:66)
   	at org.apache.iceberg.parquet.ParquetReader.iterator(ParquetReader.java:77)
   	at org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:95)
   	at org.apache.iceberg.spark.source.BaseDataReader.next(BaseDataReader.java:93)
   	at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:79)
   	at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:112)
   	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
   	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
   	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:732)
   	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
   	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
   	at org.apache.spark.scheduler.Task.run(Task.scala:117)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$8(Executor.scala:677)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:680)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   
   Driver stacktrace:
   	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2519)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2466)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2460)
   	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
   	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
   	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
   	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2460)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1152)
   	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1152)
   	at scala.Option.foreach(Option.scala:407)
   	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1152)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2721)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2668)
   	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2656)
   	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
   	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:938)
   	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2339)
   	at org.apache.spark.sql.execution.collect.Collector.runSparkJobs(Collector.scala:298)
   	at org.apache.spark.sql.execution.collect.Collector.collect(Collector.scala:308)
   	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:82)
   	at org.apache.spark.sql.execution.collect.Collector$.collect(Collector.scala:88)
   	at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:61)
   	at org.apache.spark.sql.execution.collect.InternalRowFormat$.collect(cachedSparkResults.scala:57)
   	at org.apache.spark.sql.execution.ResultCacheManager.$anonfun$getOrComputeResultInternal$1(ResultCacheManager.scala:483)
   	at scala.Option.getOrElse(Option.scala:189)
   	at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResultInternal(ResultCacheManager.scala:483)
   	at org.apache.spark.sql.execution.ResultCacheManager.getOrComputeResult(ResultCacheManager.scala:427)
   	at org.apache.spark.sql.execution.CollectLimitExec.executeCollectResult(limit.scala:58)
   	at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:3013)
   	at org.apache.spark.sql.Dataset.$anonfun$collectResult$1(Dataset.scala:3004)
   	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3728)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:116)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:248)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:101)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:841)
   	at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:77)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:198)
   	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3726)
   	at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:3003)
   	at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation0(OutputAggregator.scala:194)
   	at com.databricks.backend.daemon.driver.OutputAggregator$.withOutputAggregation(OutputAggregator.scala:57)
   	at com.databricks.backend.daemon.driver.SQLDriverLocal.executeSql(SQLDriverLocal.scala:115)
   	at com.databricks.backend.daemon.driver.SQLDriverLocal.repl(SQLDriverLocal.scala:144)
   	at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$11(DriverLocal.scala:488)
   	at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:240)
   	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
   	at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:235)
   	at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:232)
   	at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:50)
   	at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:277)
   	at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:270)
   	at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:50)
   	at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:465)
   	at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:690)
   	at scala.util.Try$.apply(Try.scala:213)
   	at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:682)
   	at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:523)
   	at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:635)
   	at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:428)
   	at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:371)
   	at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:223)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.UnsupportedOperationException: Byte-buffer read unsupported by input stream
   	at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:146)
   	at com.databricks.spark.metrics.FSInputStreamWithMetrics.$anonfun$read$1(FileSystemWithMetrics.scala:198)
   	at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
   	at com.databricks.spark.metrics.ExtendedTaskIOMetrics.withTimeMetric(FileSystemWithMetrics.scala:151)
   	at com.databricks.spark.metrics.ExtendedTaskIOMetrics.withTimeAndBytesMetric(FileSystemWithMetrics.scala:171)
   	at com.databricks.spark.metrics.ExtendedTaskIOMetrics.$anonfun$withTimeAndBytesReadMetric$1(FileSystemWithMetrics.scala:185)
   	at scala.runtime.java8.JFunction0$mcI$sp.apply(JFunction0$mcI$sp.java:23)
   	at com.databricks.spark.metrics.SamplerWithPeriod.sample(FileSystemWithMetrics.scala:78)
   	at com.databricks.spark.metrics.ExtendedTaskIOMetrics.withTimeAndBytesReadMetric(FileSystemWithMetrics.scala:185)
   	at com.databricks.spark.metrics.ExtendedTaskIOMetrics.withTimeAndBytesReadMetric$(FileSystemWithMetrics.scala:184)
   	at com.databricks.spark.metrics.FSInputStreamWithMetrics.withTimeAndBytesReadMetric(FileSystemWithMetrics.scala:192)
   	at com.databricks.spark.metrics.FSInputStreamWithMetrics.read(FileSystemWithMetrics.scala:198)
   	at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:143)
   	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.util.H2SeekableInputStream$H2Reader.read(H2SeekableInputStream.java:81)
   	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:90)
   	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:75)
   	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:542)
   	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:712)
   	at org.apache.iceberg.shaded.org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:609)
   	at org.apache.iceberg.parquet.ReadConf.newReader(ReadConf.java:218)
   	at org.apache.iceberg.parquet.ReadConf.<init>(ReadConf.java:74)
   	at org.apache.iceberg.parquet.ParquetReader.init(ParquetReader.java:66)
   	at org.apache.iceberg.parquet.ParquetReader.iterator(ParquetReader.java:77)
   	at org.apache.iceberg.spark.source.RowDataReader.open(RowDataReader.java:95)
   	at org.apache.iceberg.spark.source.BaseDataReader.next(BaseDataReader.java:93)
   	at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:79)
   	at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:112)
   	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
   	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
   	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
   	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
   	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:732)
   	at org.apache.spark.sql.execution.collect.UnsafeRowBatchUtils$.encodeUnsafeRows(UnsafeRowBatchUtils.scala:80)
   	at org.apache.spark.sql.execution.collect.Collector.$anonfun$processFunc$1(Collector.scala:187)
   	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
   	at org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
   	at org.apache.spark.scheduler.Task.run(Task.scala:117)
   	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$8(Executor.scala:677)
   	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1581)
   	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:680)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	... 1 more


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org