You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/11/29 00:35:00 UTC

[jira] [Updated] (SPARK-46108) XML: keepInnerXmlAsRaw option

     [ https://issues.apache.org/jira/browse/SPARK-46108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated SPARK-46108:
-----------------------------------
    Labels: pull-request-available  (was: )

> XML: keepInnerXmlAsRaw option
> -----------------------------
>
>                 Key: SPARK-46108
>                 URL: https://issues.apache.org/jira/browse/SPARK-46108
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.0.0
>            Reporter: Ufuk Süngü
>            Priority: Major
>              Labels: pull-request-available
>
> Built-in XML data source gives related value and schema of the inner or nested elements. However, additional operations should be made by developers manually to convert unstructured data to structured, tabular format. If nested elements are kept in a format that is suitable with XML (for each level), we can convert them easily to a structured, tabular format with the existing methods that have already been developed (infer method of XmlInferSchema and parseColumn method of StaxXmlParser). Therefore there should be an option that affects StaxXmlParser and InferSchema classes to keep inner XML elements in their original or raw format.
> https://github.com/apache/spark/pull/44022



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org