You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Arun Patel <ar...@gmail.com> on 2016/11/03 21:37:42 UTC

Spark XML ignore namespaces

I see that 'ignoring namespaces' issue is resolved.

https://github.com/databricks/spark-xml/pull/75

How do we enable this option and ignore namespace prefixes?

- Arun

Re: Spark XML ignore namespaces

Posted by Hyukjin Kwon <gu...@gmail.com>.

Oh, that PR was actually about not concerning the namespaces (meaning
leaving data as they are, including prefixes).


The problem was, each partition needs to produce each record with knowing
the namesapces.

It is fine to deal with them if they are within each XML documentation
(represented as a row in dataframe) but

it becomes problematic if they are in the parent of each XML documentation
(represented as a row in dataframe).


There is an issue open for this,
https://github.com/databricks/spark-xml/issues/74

It'd be nicer if we have an option to enable/disable this if we can
properly support namespace handling.


We might be able to talk more there.



2016-11-04 6:37 GMT+09:00 Arun Patel <ar...@gmail.com>:

> I see that 'ignoring namespaces' issue is resolved.
>
> https://github.com/databricks/spark-xml/pull/75
>
> How do we enable this option and ignore namespace prefixes?
>
> - Arun
>