You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2021/09/06 14:44:47 UTC

[GitHub] [drill] cgivre commented on issue #2307: XML format plugin concatenates attribute values from multiple sub-elements with the same name

cgivre commented on issue #2307:
URL: https://github.com/apache/drill/issues/2307#issuecomment-913701607


   @KendraKrat 
   Thanks for reporting this.  The issue here is that Drill is using a streaming reader and doesn't know the schema in advance.   Drill sees the first field and interprets that as an empty `VARCHAR` field with two attributes.  Then, it sees the next field with the same name, `extra` and same attributes and has no way to determine the intent of the data. 
   
   I would actually argue that this isn't a great way to format XML, but often we're stuck with what the data provider gives us, so it's a moot point. 
   
   I've thought about adding `list` support for the XML reader which would partially address this, however the real way would be to add provided schema and XSD support.  That way you can explicitly tell Drill what to expect in terms of schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org