You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "yihangqiao (Jira)" <ji...@apache.org> on 2022/08/29 06:41:00 UTC

[jira] [Updated] (SPARK-40253) Data read exception in orc format

     [ https://issues.apache.org/jira/browse/SPARK-40253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

yihangqiao updated SPARK-40253:
-------------------------------
    Issue Type: Bug  (was: Improvement)

>  Data read exception in orc format
> ----------------------------------
>
>                 Key: SPARK-40253
>                 URL: https://issues.apache.org/jira/browse/SPARK-40253
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.3
>         Environment: os centos7
> spark 2.4.3
> hive 1.2.1
> hadoop 2.7.2
>            Reporter: yihangqiao
>            Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> When running batches using spark-sql and using the create table xxx as select syntax, the select query part uses a static value as the default value (0.00 as column_name) and does not specify the data type of the default value. In this usage scenario, because the data type is not explicitly specified, the metadata information of the field in the written ORC file is missing (the writing is successful), but when reading, as long as the query column contains this field, it will not be able to Parsing the ORC file, the following error occurs:
> Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 1 kind SECONDARY position: 0 length: 0 range: 0 offset: 0 limit: 0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org