You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/30 07:57:35 UTC

[GitHub] [iceberg] hameizi opened a new issue #2896: Spark: read migrate table error

hameizi opened a new issue #2896:
URL: https://github.com/apache/iceberg/issues/2896


   I use the feature in spark-iceberg "table migration" and i execute the sql as below, but when i read the migrate table i get execption like this "Previous exception in task: tried to release a buffer that was not created by this stream, java.nio.DirectByteBuffer[pos=0 lim=7 cap=13]"
   `CALL hive.system.snapshot('spark_catalog.flink.hive_adv_game_recommend_test17', 'hive.flink.iceberg2_hive_adv_game_recommend_test17');`
   `select uuid from hive.flink.iceberg2_hive_adv_game_recommend_test17;`
   
   The total execption info is 
   `21/07/30 15:45:27 ERROR source.BaseDataReader: Error reading file: hdfs://stream/data/hive/meta/root/hive-databases/flink.db/hive_adv_game_recommend_test17/pt=test/result-1627524904535.orc
   java.lang.IllegalArgumentException: tried to release a buffer that was not created by this stream, java.nio.DirectByteBuffer[pos=0 lim=7 cap=13]
           at org.apache.hadoop.hdfs.DFSInputStream.releaseBuffer(DFSInputStream.java:1847)
           at org.apache.hadoop.fs.FSDataInputStream.releaseBuffer(FSDataInputStream.java:212)
           at org.apache.iceberg.shaded.org.apache.orc.impl.ZeroCopyShims$ZeroCopyAdapter.releaseBuffer(ZeroCopyShims.java:75)
           at org.apache.iceberg.shaded.org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.releaseBuffer(RecordReaderUtils.java:132)
           at org.apache.iceberg.shaded.org.apache.orc.impl.reader.StripePlanner$StreamInformation.releaseBuffers(StripePlanner.java:518)
           at org.apache.iceberg.shaded.org.apache.orc.impl.reader.StripePlanner.clearStreams(StripePlanner.java:183)
           at org.apache.iceberg.shaded.org.apache.orc.impl.RecordReaderImpl.clearStreams(RecordReaderImpl.java:1090)
           at org.apache.iceberg.shaded.org.apache.orc.impl.RecordReaderImpl.close(RecordReaderImpl.java:1266)
           at org.apache.iceberg.orc.VectorizedRowBatchIterator.close(VectorizedRowBatchIterator.java:50)
           at org.apache.iceberg.orc.OrcIterable$OrcRowIterator.close(OrcIterable.java:169)
           at org.apache.iceberg.io.FilterIterator.close(FilterIterator.java:83)
           at org.apache.iceberg.io.FilterIterator.advance(FilterIterator.java:73)
           at org.apache.iceberg.io.FilterIterator.hasNext(FilterIterator.java:50)
           at org.apache.iceberg.spark.source.BaseDataReader.next(BaseDataReader.java:87)
           at org.apache.spark.sql.execution.datasources.v2.PartitionIterator.hasNext(DataSourceRDD.scala:79)
           at org.apache.spark.sql.execution.datasources.v2.MetricsIterator.hasNext(DataSourceRDD.scala:112)
           at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
           at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:458)
           at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
           at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
           at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
           at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:345)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:872)
           at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:872)
           at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
           at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
           at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
           at org.apache.spark.scheduler.Task.run(Task.scala:127)
           at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:462)
           at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
           at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:465)
           at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
           at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
           at java.lang.Thread.run(Thread.java:745)
   `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] hameizi commented on issue #2896: Spark: read migrate table error

Posted by GitBox <gi...@apache.org>.

hameizi commented on issue #2896:
URL: https://github.com/apache/iceberg/issues/2896#issuecomment-890735684


   > And the schema of the actual files? And could you select just one column at a time from the table and see if any of them read correctly?
   
   @RussellSpitzer The schema of the actual files correct too. I just select one column also get the same exception. And i remove the partition column which is not pyhsics exist in the hive table file can not slove the exception too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] hameizi commented on issue #2896: Spark: read migrate table error

Posted by GitBox <gi...@apache.org>.

hameizi commented on issue #2896:
URL: https://github.com/apache/iceberg/issues/2896#issuecomment-889716701


   @aokolnychyi Could you help take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on issue #2896: Spark: read migrate table error

Posted by GitBox <gi...@apache.org>.

RussellSpitzer commented on issue #2896:
URL: https://github.com/apache/iceberg/issues/2896#issuecomment-889875773


   My guess would be a schema issue during import. Can you validate the schema created by iceberg and the schema of the original files?
   
   I would also go one by one through the columns (pruning out all but one column) and determine if there is an alignment issue or whether it is just one column.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] hameizi edited a comment on issue #2896: Spark: read migrate table error

Posted by GitBox <gi...@apache.org>.

hameizi edited a comment on issue #2896:
URL: https://github.com/apache/iceberg/issues/2896#issuecomment-890662602


   > My guess would be a schema issue during import. Can you validate the schema created by iceberg and the schema of the original files?
   > 
   > I would also go one by one through the columns (pruning out all but one column) and determine if there is an alignment issue or whether it is just one column.
   
   @RussellSpitzer thanks for you reply, the schema is correct like below, but get the same exception too.
   Flink SQL> desc flink.hive_adv_game_recommend_test19;
   +------------+--------+------+-----+--------+-----------+
   |       name |   type | null | key | extras | watermark |
   +------------+--------+------+-----+--------+-----------+
   | gameidfrom |    INT | true |     |        |           |
   |       guid | STRING | true |     |        |           |
   |       uuid | STRING | true |     |        |           |
   |     userid | STRING | true |     |        |           |
   | useridfrom |    INT | true |     |        |           |
   |   manageid | BIGINT | true |     |        |           |
   |         pt | STRING | true |     |        |           |
   +------------+--------+------+-----+--------+-----------+
   7 rows in set
   
   Flink SQL> desc iceberg_hive_catalog.flink.iceberg_hive_adv_game_recommend_test19;
   +------------+--------+------+-----+--------+-----------+
   |       name |   type | null | key | extras | watermark |
   +------------+--------+------+-----+--------+-----------+
   | gameidfrom |    INT | true |     |        |           |
   |       guid | STRING | true |     |        |           |
   |       uuid | STRING | true |     |        |           |
   |     userid | STRING | true |     |        |           |
   | useridfrom |    INT | true |     |        |           |
   |   manageid | BIGINT | true |     |        |           |
   |         pt | STRING | true |     |        |           |
   +------------+--------+------+-----+--------+-----------+
   7 rows in set


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on issue #2896: Spark: read migrate table error

Posted by GitBox <gi...@apache.org>.

RussellSpitzer commented on issue #2896:
URL: https://github.com/apache/iceberg/issues/2896#issuecomment-890665719


   @hameizi And the schema of the actual files? And could you select just one column at a time from the table and see if any of them read correctly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] hameizi commented on issue #2896: Spark: read migrate table error

Posted by GitBox <gi...@apache.org>.

hameizi commented on issue #2896:
URL: https://github.com/apache/iceberg/issues/2896#issuecomment-890662602


   > My guess would be a schema issue during import. Can you validate the schema created by iceberg and the schema of the original files?
   > 
   > I would also go one by one through the columns (pruning out all but one column) and determine if there is an alignment issue or whether it is just one column.
   
   Schema is correct like below
   Flink SQL> desc flink.hive_adv_game_recommend_test19;
   +------------+--------+------+-----+--------+-----------+
   |       name |   type | null | key | extras | watermark |
   +------------+--------+------+-----+--------+-----------+
   | gameidfrom |    INT | true |     |        |           |
   |       guid | STRING | true |     |        |           |
   |       uuid | STRING | true |     |        |           |
   |     userid | STRING | true |     |        |           |
   | useridfrom |    INT | true |     |        |           |
   |   manageid | BIGINT | true |     |        |           |
   |         pt | STRING | true |     |        |           |
   +------------+--------+------+-----+--------+-----------+
   7 rows in set
   
   Flink SQL> desc iceberg_hive_catalog.flink.iceberg_hive_adv_game_recommend_test19;
   +------------+--------+------+-----+--------+-----------+
   |       name |   type | null | key | extras | watermark |
   +------------+--------+------+-----+--------+-----------+
   | gameidfrom |    INT | true |     |        |           |
   |       guid | STRING | true |     |        |           |
   |       uuid | STRING | true |     |        |           |
   |     userid | STRING | true |     |        |           |
   | useridfrom |    INT | true |     |        |           |
   |   manageid | BIGINT | true |     |        |           |
   |         pt | STRING | true |     |        |           |
   +------------+--------+------+-----+--------+-----------+
   7 rows in set


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org