You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "VON RUEDEN, Jonathan" <jo...@sap.com> on 2016/07/15 12:44:17 UTC

XML

Hi everyone,

I want to read an XML file with multiple attributes per tag and would need some help. I am able to read and process the sample files but can't find a solution for my XML.
Here's the file structure:
<?xml version="1.0" encoding="UTF-8"?>
<report format="1.0">
   <creationTime millis="1468158875331" readable="2016-07-10 13:54:35 +0000" />
   <project artifactid="fin.ap.balances.display" gitUrl="ssh://git.wdf.sap.corp:2/path/path/path" groupid="com.sap.prod.prod" parentArtifactId="name.name" parentVersion="1.12.2" version="4.0.7-SNAPSHOT">
      <check columnNumber="0" context="4.0.6" errorType="PREVIOUS_PROJECT_VERSION" filePath="/hompath/path/path" lineNumber="0" message="Reporting :: Previous version checked for compatibility&#xA;For details, see: https://githudoc.doc.doc.doc.docm.md" severity="Info" />
      <check columnNumber="0" context="Directories in '/src/main/webapp': [WEB-INF, model, view, util, css, img, i18n]" errorType="PROJECT_OLD_STRUCTURE" filePath="/hpathpathpath/ath/webapp" lineNumber="0" message="Reporting :: Using old project structure&#xA;For details, see: https://github.wdf.sap.coath.oath/pathpath.nmd" severity="Info" />
   </project>
</report>


--> Is there any way I can have com.databricks.spark.xml write all the attributes into one cell as a string and I come up with my own way of splitting and transforming this into a table? Do you guys know how I can read in such a file.
thanks much,
best,
jonathan


[SAP_grad_R_pref.png]

Jonathan von Rüden
Enterprise Analytics

SAP France | Paris
Mobile: +33 68 221-2425
Email: Jonathan.von.rueden@sap.com<ma...@sap.com>