You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@nifi.apache.org by stevedlawrence <gi...@git.apache.org> on 2018/11/05 18:28:03 UTC

[GitHub] nifi issue #3130: NIFI-5791: Add Apache Daffodil (incubating) bundle

Github user stevedlawrence commented on the issue:

    https://github.com/apache/nifi/pull/3130
  
    Documentation about this new processor:
    
    [Apache Daffodil (incubating)](https://daffodil.apache.org) is the open source implementation of the [Data Format Description Language (DFDL)](https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl). DFDL is a language capable of describing many data formats, including textual and binary, commercial record-oriented, scientific and numeric, modern and legacy, and many industry standards. It leverages XML technology and concepts, using a subset of W3C XML schema type system and annotations to describe such data. Daffodil uses this data description to "parse" data into an XML representation of the data. This allows one to take advantage of the many XML and JSON technologies (e.g. XQuery, XPath, XSLT) to ingest, validate, and manipulate complex data formats. Daffodil can also use this data description to "unparse", or serialize, the XML or JSON representation back to the original data format.
    
    This PR provides a new Daffodil bundle containing DaffodilParse and DaffodilUnparse processors.
    
    For an example of it's usage, I've provided a NiFi template here:
    
    https://gist.github.com/stevedlawrence/5a8259c9fffb3cb3b317ba31a6ef0494#file-daffodil_pcap_filter_nifi_template-xml
    
    Which looks like this:
    
    ![nifi-daffodil-pcap-filter](https://user-images.githubusercontent.com/3180601/48017874-6a536c80-e0fd-11e8-9aa0-3eb785a99157.png)
    
    
    To set the environment up to work with this template, perform the following:
    ```
    mkdir -p /tmp/nifi/{getfile,putfile}
    git clone https://github.com/DFDLSchemas/PCAP.git /tmp/nifi/PCAP
    curl "https://gist.githubusercontent.com/stevedlawrence/5a8259c9fffb3cb3b317ba31a6ef0494/raw/c19fddd6d1e73777e10a549bfd369b077aefbb50/pcap-filter.xsl" > /tmp/nifi/pcap-filter.xsl
    ```
    The template has 5 processors in a single pipeline that performs the following
    1. **GetFile** - Reads a PCAP file from ``/tmp/nifi/getfile``
    1. **DaffodilParse** - Parses the PCAP file to an XML representation
    1. **TransformXML** - Removes all XML elements that have an IP address of ``192.168.170.8``
    1. **DaffodilUnparse** - Unparses the filtered XML back to PCAP file format
    1. **PutFile** - Writes the filtered PCAP file to ``/tmp/nifi/putfile``
    
    To test this flow, perform the following:
    ```
    cp /tmp/nifi/PCAP/src/test/resources/com/tresys/pcap/data/dns.cap /tmp/nifi/getfile/
    ```
    The original dns.cap file has about 40 packets. After filtering, the new pcap file written by daffodil has approximately 10 that were not filtered out.
    



---