You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiao Li (JIRA)" <ji...@apache.org> on 2019/06/30 18:17:00 UTC
[jira] [Assigned] (SPARK-11412) Support merge schema for ORC
[ https://issues.apache.org/jira/browse/SPARK-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiao Li reassigned SPARK-11412:
-------------------------------
Assignee: EdisonWang
> Support merge schema for ORC
> ----------------------------
>
> Key: SPARK-11412
> URL: https://issues.apache.org/jira/browse/SPARK-11412
> Project: Spark
> Issue Type: New Feature
> Components: SQL
> Affects Versions: 1.6.3, 2.0.0, 2.1.1, 2.2.0
> Reporter: Dave
> Assignee: EdisonWang
> Priority: Major
> Fix For: 3.0.0
>
>
> when I tried to load partitioned orc files with a slight difference in a nested column. say
> column
> -- request: struct (nullable = true)
> | |-- datetime: string (nullable = true)
> | |-- host: string (nullable = true)
> | |-- ip: string (nullable = true)
> | |-- referer: string (nullable = true)
> | |-- request_uri: string (nullable = true)
> | |-- uri: string (nullable = true)
> | |-- useragent: string (nullable = true)
> And then there's a page_url_lists attributes in the later partitions.
> I tried to use
> val s = sqlContext.read.format("orc").option("mergeSchema", "true").load("/data/warehouse/xxxx") to load the data.
> But the schema doesn't show request.page_url_lists.
> I am wondering if schema merge doesn't work for orc?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org