You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2023/04/08 21:43:00 UTC
[jira] [Resolved] (SPARK-42976) spark sql Disable vectorized faild
[ https://issues.apache.org/jira/browse/SPARK-42976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim resolved SPARK-42976.
----------------------------------
Resolution: Invalid
Please use user mailing list for questions. Furthermore if you read from Iceberg and see failure, Iceberg user community is the right place to ask.
> spark sql Disable vectorized faild
> -----------------------------------
>
> Key: SPARK-42976
> URL: https://issues.apache.org/jira/browse/SPARK-42976
> Project: Spark
> Issue Type: Question
> Components: Spark Shell, SQL
> Affects Versions: 3.3.2
> Environment: spark :3.3_2.12
> hive : 3.1.1
> iceberg: iceberg-spark-runtime-3.3_2.12-1.2.0
>
>
>
> Reporter: liu
> Priority: Minor
> Labels: features
>
> spark-sql start config:
>
> {code:java}
> ./spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.2.0\
> --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
> --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
> --conf spark.sql.catalog.spark_catalog.type=hive \
> --conf spark.sql.iceberg.handle-timestamp-without-timezone=true \
> --conf spark.sql.parquet.binaryAsString=true \
> --conf spark.sql.parquet.enableVectorizedReader=false \
> --conf spark.sql.parquet.enableNestedColumnVectorizedReader=true \
> --conf spark.sql.parquet.recordLevelFilter=true {code}
> Now that I have configured spark. sql. queue. enableVectorizedReader=false,but i query a iceberg parquet table,the following error occurred:
>
> {code:java}
> at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
> at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:286)
> at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
> at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
> at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
> at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: java.lang.UnsupportedOperationException: Cannot support vectorized reads for column [hzxm] optional binary hzxm = 8 with encoding DELTA_BYTE_ARRAY. Disable vectorized reads to read this table/file
> at org.apache.iceberg.arrow.vectorized.parquet.VectorizedPageIterator.initDataReader(VectorizedPageIterator.java:100)
> at org.apache.iceberg.parquet.BasePageIterator.initFromPage(BasePageIterator.java:140)
> at org.apache.iceberg.parquet.BasePageIterator$1.visit(BasePageIterator.java:105)
> at org.apache.iceberg.parquet.BasePageIterator$1.visit(BasePageIterator.java:96)
> at org.apache.iceberg.shaded.org.apache.parquet.column.page.DataPageV2.accept(DataPageV2.java:192)
> at org.apache.iceberg.parquet.BasePageIterator.setPage(BasePageIterator.java:95)
> at org.apache.iceberg.parquet.BaseColumnIterator.advance(BaseColumnIterator.java:61)
> at org.apache.iceberg.parquet.BaseColumnIterator.setPageSource(BaseColumnIterator.java:50)
> at org.apache.iceberg.arrow.vectorized.parquet.VectorizedColumnIterator.setRowGroupInfo(Vec {code}
> *{color:#FF0000}Caused by: java.lang.UnsupportedOperationException: Cannot support vectorized reads for column [hzxm] optional binary hzxm = 8 with encoding DELTA_BYTE_ARRAY. Disable vectorized reads to read this table/file{color}*
>
> Now it seems that this parameter has not worked. How can I turn off this function so that I can successfully query the table
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org