You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@nifi.apache.org by "Jens M. Kofoed" <jm...@gmail.com> on 2020/05/28 06:22:27 UTC

Need help converting xml data

Hi there
I'm a newbie regarding processing records in nifi and I'm stuck.
One of my issues is I don't know the complete schema for the data I have to
process.
Therefore I have configured a XMLReader to use the Infer Schema. The other
issue is that I have problems converting sub records. My records looks
something like this:
<RootLabel>
    <Part1>
        <name>John Doe</name>
        <adress>some there</adress>
    </Part1>
    <Part2>
        <Job>workingman</Job>
    </Part2>
    <Part3>
        <Details>
            <additionalInfo name="Location">New York</additionalInfo>
            <additionalInfo name="Company">A Company</additionalInfo>
        </Details>
    </Part3>
</RootLabel>

The issues are with the subrecords in part 3. I have configured the
XMLReader property "Field Name for Content" = value

When the data is being converted via a XMLWriter the output for the
additionalInfo fields looks like this:
<Part3>
    <Details>
        <additionalInfo>MapRecord[{name=Location, value=New
York}]</additionalInfo>
        <additionalInfo>MapRecord[{name=Company, value=A
Company}]</additionalInfo>
    </Details>
</Part3>

If I use a JSONWriter I gets this:
"Part3": {
    "Details": {
        "additionalInfo": [ "MapRecord[{name=Location, value=New York}]",
"MapRecord[{name=Company, value=A Company}]" ]
    }
}

How do I get the same xml output as the original input?
How can I convert the input to JSON so it looks something like this:
"Part3": {
    "Details": {
        "additionalInfo": {
            "Location": "New York",
            "Company": "A Company"
        }
    }
}

Please help...

Kind regards
Jens M. Kofoed

Re: Need help converting xml data

Posted by Mark Payne <ma...@hotmail.com>.
Hi Jens,

Unfortunately, this looks like a bug in the schema inference for XML. The schema inference appears to be inferring a type of String for the Details, but the XML Reader is actually returning a Record. As a result, it turns that record into a String, which gives you the odd output like "MapRecord[{name=Location, value=New York)}]”

I have filed a Jira [1] to address this.

Until that is addressed, you may end up needing to provide an explicit schema. The good news is that in the Record Writer, you can configure it to add the schema to the ‘Avro.schema’ attribute. This inferred schema should be *almost* what you need, though obviously not entirely correct because the additionalInfo element here needs to be a Record. But it may get you 90% of the way there.

THanks
-Mark

[1] https://issues.apache.org/jira/browse/NIFI-7493

On May 28, 2020, at 2:22 AM, Jens M. Kofoed <jm...@gmail.com>> wrote:

Hi there
I'm a newbie regarding processing records in nifi and I'm stuck.
One of my issues is I don't know the complete schema for the data I have to process.
Therefore I have configured a XMLReader to use the Infer Schema. The other issue is that I have problems converting sub records. My records looks something like this:
<RootLabel>
    <Part1>
        <name>John Doe</name>
        <adress>some there</adress>
    </Part1>
    <Part2>
        <Job>workingman</Job>
    </Part2>
    <Part3>
        <Details>
            <additionalInfo name="Location">New York</additionalInfo>
            <additionalInfo name="Company">A Company</additionalInfo>
        </Details>
    </Part3>
</RootLabel>

The issues are with the subrecords in part 3. I have configured the XMLReader property "Field Name for Content" = value

When the data is being converted via a XMLWriter the output for the additionalInfo fields looks like this:
<Part3>
    <Details>
        <additionalInfo>MapRecord[{name=Location, value=New York}]</additionalInfo>
        <additionalInfo>MapRecord[{name=Company, value=A Company}]</additionalInfo>
    </Details>
</Part3>

If I use a JSONWriter I gets this:
"Part3": {
    "Details": {
        "additionalInfo": [ "MapRecord[{name=Location, value=New York}]", "MapRecord[{name=Company, value=A Company}]" ]
    }
}

How do I get the same xml output as the original input?
How can I convert the input to JSON so it looks something like this:
"Part3": {
    "Details": {
        "additionalInfo": {
            "Location": "New York",
            "Company": "A Company"
        }
    }
}

Please help...

Kind regards
Jens M. Kofoed