You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Matt Burgess (JIRA)" <ji...@apache.org> on 2019/04/25 17:23:00 UTC

[jira] [Commented] (NIFI-6241) ConvertRecord Schema Inference fails to infer complete schema, or simply fails

    [ https://issues.apache.org/jira/browse/NIFI-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16826255#comment-16826255 ] 

Matt Burgess commented on NIFI-6241:
------------------------------------

I believe the need for a root tag is because the record-based processors are meant to work on flow files containing multiple records. Currently for the XMLReader it expects a root tag even if there is only one record in the flow file. Perhaps it is possible to relax this requirement if there is only one record.

For the "missing" fields, to me it looks like no fields were inferred because there are no fields with explicit values within, only self-closing tags with attributes. I think that's expected behavior until we revamp the schema system to support formats that have metadata about the fields themselves (XML tag attributes, e.g.). What fields/values were you expecting? Perhaps we could add a property to extract attributes as fields or something.

> ConvertRecord Schema Inference fails to infer complete schema, or simply fails
> ------------------------------------------------------------------------------
>
>                 Key: NIFI-6241
>                 URL: https://issues.apache.org/jira/browse/NIFI-6241
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.9.2
>            Reporter: David Sargrad
>            Priority: Major
>         Attachments: Reproduce_ConvertRecord_Shortcoming.xml, image-2019-04-24-13-38-16-605.png, image-2019-04-24-13-39-36-327.png, image-2019-04-24-13-41-00-704.png, image-2019-04-24-13-41-26-860.png, image-2019-04-24-13-43-28-531.png, image-2019-04-24-13-43-59-706.png, image-2019-04-24-17-03-10-728.png, image-2019-04-25-09-13-52-416.png, image-2019-04-25-09-19-15-406.png, image-2019-04-25-09-30-08-297.png
>
>
> I've got a simple test flow as depicted below:
>  
>  
> !image-2019-04-24-13-38-16-605.png!
>  
> The input XML is:
> !image-2019-04-24-13-41-26-860.png!
>  
> The output JSON is almost correct, yet it is missing two critical fields (they both show up as "null". The null fields are {color:#ff0000}position{color} and {color:#ff0000}ncsmTrackData{color}. It is also missing all of the attributes on fltdMessage.
>  
> !image-2019-04-24-13-41-00-704.png!
>  
> The configuration of my ConvertRecord is:
> !image-2019-04-24-13-43-28-531.png!
>  
> My XMLReader configuration is:
> !image-2019-04-24-13-43-59-706.png!
>  
>  Questions:
>  # Why are these two fields null? 
>  # Why are all the fltdMessage attributes being ignored?
> It would seem that this is a bug, or at least a major shortcoming, in the schema inference capability. If there were a way for me to view the inferred schema, then I could use that as a starting point. However its not clear from the documentation how to view that schema.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)