You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by "Andrew Chafos (Jira)" <ji...@apache.org> on 2020/07/31 17:25:00 UTC

[jira] [Created] (NIFI-7697) NiFi XMLReader Record Component sometimes ignores empty XML Elements

Andrew Chafos created NIFI-7697:
-----------------------------------

             Summary: NiFi XMLReader Record Component sometimes ignores empty XML Elements
                 Key: NIFI-7697
                 URL: https://issues.apache.org/jira/browse/NIFI-7697
             Project: Apache NiFi
          Issue Type: Bug
          Components: Core Framework
    Affects Versions: 1.11.4
         Environment: Windows 10
            Reporter: Andrew Chafos


I am currently developing a processor for Apache NiFi that is contingent upon being configured with an implementation of RecordReaderFactory that produces well-formed NiFi Records based on input data.

The JsonTreeReader component produced accurate results for all of my test cases.  However, I noticed that, at least with the default configuration, the XMLReader component sometimes seems to mishandle data; namely, empty XML elements that are sub-children of XML elements that are represented as Arrays in NiFi Records.

This occurs when I test using the standard ConvertRecord NiFi Processor and set the Reader to XMLReader and the Writer to JsonRecordSetWriter.

These first 2 test cases work as expected:

*Test Case 1:*

Input XML:
{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<Root>
   <DataArr>SomeData</DataArr>
   <DataArr>
      <Field>
         <NonEmptyField>2</NonEmptyField>
      </Field>
   </DataArr>
</Root>
{code}
Output Json:
{code:json}
[
   {
      "DataArr":[
         "SomeData",
         "MapRecord[{Field=MapRecord[{NonEmptyField=2}]}]"
      ]
   }
]
{code}
*Test Case 2:*

Input XML:
{code:xml}
<?xml version="1.0" encoding="UTF-8"?>
<Root>
   <SomeData />
   <MoreData>2</MoreData>
</Root>
{code}

Output Json:
{code:json}
[
   {
      "SomeData":null,
      "MoreData":2
   }
]
{code}

However, the following does *not* work as expected:

*Test Case 3:*

Input XML:
{code:xml}
<Root>
   <DataArr>SomeData</DataArr>
   <DataArr>
      <Field>
         <EmptyField/>
      </Field>
   </DataArr>
</Root>
{code}

Output Json:
{code:json}
[
   {
      "DataArr":[
         "SomeData"
      ]
   }
]
{code}

It is critical for the functioning of my Processor that Field and EmptyField appear in this Json output for Test Case 3, and for all other inputs analogous to this case.

I have tried to supply a custom NiFi RecordSchema to the components and verified it was being used, but I got the same results.

Is there a way to configure these controllers such that this empty field is not ignored, or is this a bug in the XMLReader component?

You can get these results from running this processor as described on NiFi, but you can also run this JUnit test with testXml swapped out with the particular test case:

{code:java}
import org.apache.nifi.controller.ControllerService;
import org.apache.nifi.json.JsonRecordSetWriter;
import org.apache.nifi.processor.Relationship;
import org.apache.nifi.processors.standard.ConvertRecord;
import org.apache.nifi.reporting.InitializationException;
import org.apache.nifi.util.MockFlowFile;
import org.apache.nifi.util.TestRunner;
import org.apache.nifi.util.TestRunners;
import org.apache.nifi.xml.XMLReader;
import org.junit.Test;

public class TestNiFiMinimal {
    @Test
    public void testEmptyXMLGetsProcessed() throws InitializationException {
        ConvertRecord convertRecord = new ConvertRecord();
        TestRunner testRunner = TestRunners.newTestRunner(convertRecord);
        ControllerService xmlReader = new XMLReader();
        testRunner.addControllerService("xmlReader", xmlReader);
        testRunner.enableControllerService(xmlReader);
        testRunner.setProperty("record-reader", "xmlReader");
        ControllerService jsonWriter = new JsonRecordSetWriter();
        testRunner.addControllerService("jsonWriter", jsonWriter);
        testRunner.enableControllerService(jsonWriter);
        testRunner.setProperty("record-writer", "jsonWriter");
        String testXml = "<?xml version='1.0' encoding='UTF-8'?><Root><DataArr>SomeData</DataArr><DataArr><Field><EmptyField/></Field></DataArr></Root>";
        testRunner.enqueue(testXml);
        testRunner.run();
        Relationship success = convertRecord.getRelationships().stream().filter(relationship -> relationship.getName().equals("success")).findAny().get();
        testRunner.assertAllFlowFilesTransferred(success);
        final MockFlowFile original = testRunner.getFlowFilesForRelationship(success).get(0);
        original.assertContentEquals("");
    }
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)