You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by Roded Bahat <ro...@model9.io> on 2023/03/13 10:36:43 UTC

Questions from a Daffodil newbie

Hi all,
I'm looking into integrating Apache Daffodil into our product and have
several questions for which I could not find answers in the
documentation or issues.

1. Is it currently possible to extend Daffodil with custom types? For
example, could I create a custom field type for a field compressed with
a custom compression and have Daffodil call my own code for further
parsing of the original field value?
2. The DFDL spec states that additional implementation-defined encoding
names can be defined. How would a custom encoding be defined in the
DFDL specification?
3. Is it currently possible to parse a input stream but output only a
set of field from the specification? For example, could an XPath be
specified to determine which nodes in the specification Daffodil will
output?
4. Is there a recommended way of dynamically creating a DFDL
specification XSD? or should I just use general tooling?

Any pointers and help would be much appreciated.
Thanks!

Roded

Re: Questions from a Daffodil newbie

Posted by Roded Bahat <ro...@model9.io>.
Hi Claude,
I'm trying to mainly parse binary data and collect it with type
information in memory, so if my understanding is correct (?) Smooks is
less ideal for my use case.
I'll look into Mustach and XSLT as you suggest. 
Thanks for the answers.
Roded

On Tue, 2023-03-14 at 07:44 +0100, Claude Mamo wrote:
> Hi Roded,
>  
> > 3. Is it currently possible to parse a input stream but output only
> > a set of field from the specification? For example, could an XPath
> > be specified to determine which nodes in the specification Daffodil
> > will output?
> > 
> 
> 
> It's possible with Smooks's DFDL cartridge. You can select which
> elements you want to handle from the stream that Daffodil produces
> using Smooks's XPath-like language and then efficiently process the
> selected elements in whichever way you want.
> 
> > 4. Is there a recommended way of dynamically creating a DFDL
> > specification XSD? or should I just use general tooling?
> > 
> 
> 
> In the past, I've used Mustache and XSLT to generate DFDL schemas. It
> worked well for me.
> 
> Claude
> 
> On Mon, Mar 13, 2023 at 1:28 PM Steve Lawrence <sl...@apache.org>
> wrote:
> > Here's some highish level answers. If you need more details on
> > anything 
> > let us know.
> > 
> > 1. Yep, we call this feature "layers". You can create a custom
> > layer 
> > plugin that receives data (as defined by the DFDL schema), your
> > layer 
> > code transforms (e.g. uncompresses) and outputs that data, and then
> > Daffodil parses the outputted data as defined by the DFDL schema.
> > 
> > Here are implementations of the layers included with Daffodil for
> > gzip, 
> > base64, line folding, and byte swapping:
> > 
> > https://github.com/apache/daffodil/tree/main/daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1
> > 
> > And they are pluggable using Java service loaders, e.g.:
> > 
> > https://github.com/apache/daffodil/blob/main/daffodil-runtime1-layers/src/main/resources/META-INF/services/org.apache.daffodil.runtime1.layers.LayerCompiler
> > 
> > So you can create the layer outside of Daffodil, create a jar with
> > the 
> > right services file, put it on the classpath and Daffodil will be
> > able 
> > to find and use it.
> > 
> > And here is the design proposal of the feature with more details
> > and 
> > links to related design pages:
> > 
> > https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Dynamically+loading+Layer+Transformations
> > 
> > 
> > 2. I don't think we have any documentation, but we have a number of
> > examples how to define custom charsets. For example, here's a
> > fairly 
> > small IBM037 charset that we include in Daffodil which is just a
> > lookup 
> > table:
> > 
> > https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/scala/org/apache/daffodil/io/processors/charset/IBM037.scala
> > 
> > You essentially just need to implement BitsCharsetDefinition which 
> > returns a "BitsCharset" that can creae a
> > BitsCharsetEncoder/Decoder. 
> > Depending on the complexity of your charset, you maybe be able to
> > use 
> > existing base classes (e.g. BitsCharseJava) that do a lot of the
> > heavy 
> > lifting.
> > 
> > Note that these are also loaded using Java service loaders, e.g.:
> > 
> > https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/resources/META-INF/services/org.apache.daffodil.io.processors.charset.BitsCharsetDefinition
> > 
> > 
> > 3. Not at the moment. If you wanted only a subset of fields, you
> > would 
> > need to post process the fields and extract what parts you need 
> > yourself. Languages like XSLT/XQuery could probably do this without
> > too 
> > much effort.
> > 
> > Another alternative would be to create a custom InfosetOutputter
> > that 
> > would ignore infoset events that you don't care about and keep
> > those you 
> > do. You could use your own logic for how you determine which fields
> > are 
> > important, or you could also use dfdlx:runtimeProperties to
> > annotate the 
> > schema and have your custom InfosetOutputter use those. Here's the 
> > design information on runtime properties:
> > 
> > https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties
> > 
> > Here's a small example of a custom InfosetOutputter we use for
> > testing, 
> > which just captures all events and stores them in a list. You could
> > imagine doing some sort of filtering and only capture the fields
> > you 
> > want and ouputting to a custom data structure instead of XML, for
> > example.
> > 
> > https://github.com/apache/daffodil/blob/main/daffodil-japi/src/test/java/org/apache/daffodil/example/TestInfosetOutputter.java
> > 
> > 
> > 4. I haven't personally done a lot of DFDL schema generation,
> > though I 
> > know other Daffodil devs have, they may be able to chime in on
> > helpful 
> > tips. But I don't think it's anything unique really. I think mostly
> > what 
> > they do is get a machine readable specification of the data format,
> > load 
> > that into some model and then iterate over the model and output
> > strings 
> > to file. We're very familiar with Scala so we tend to write DFDL
> > schema 
> > generators in that, which is also nice since it has language
> > support for 
> > XML. So XML templates are sort of built into the language. But any 
> > template language would probably work fine.
> > 
> > - Steve
> > 
> > 
> > 
> > On 2023-03-13 06:36 AM, Roded Bahat wrote:
> > > Hi all,
> > > I'm looking into integrating Apache Daffodil into our product and
> > have 
> > > several questions for which I could not find answers in the 
> > > documentation or issues.
> > > 
> > > 1. Is it currently possible to extend Daffodil with custom types?
> > For 
> > > example, could I create a custom field type for a field
> > compressed with 
> > > a custom compression and have Daffodil call my own code for
> > further 
> > > parsing of the original field value?
> > > 2. The DFDL spec states that additional implementation-defined
> > encoding 
> > > names can be defined. How would a custom encoding be defined in
> > the DFDL 
> > > specification?
> > > 3. Is it currently possible to parse a input stream but output
> > only a 
> > > set of field from the specification? For example, could an XPath
> > be 
> > > specified to determine which nodes in the specification Daffodil
> > will 
> > > output?
> > > 4. Is there a recommended way of dynamically creating a DFDL 
> > > specification XSD? or should I just use general tooling?
> > > 
> > > Any pointers and help would be much appreciated.
> > > Thanks!
> > > 
> > > Roded
> > 


Re: Questions from a Daffodil newbie

Posted by Claude Mamo <cl...@gmail.com>.
Hi Roded,


> 3. Is it currently possible to parse a input stream but output only a set
> of field from the specification? For example, could an XPath be specified
> to determine which nodes in the specification Daffodil will output?
>

It's possible with Smooks's DFDL cartridge
<https://github.com/smooks/smooks-dfdl-cartridge>. You can select which
elements you want to handle from the stream that Daffodil produces using
Smooks's XPath-like language and then efficiently process the selected
elements in whichever way you want.

4. Is there a recommended way of dynamically creating a DFDL specification
> XSD? or should I just use general tooling?
>

In the past, I've used Mustache <https://github.com/spullara/mustache.java>
and XSLT to generate DFDL schemas. It worked well for me.

Claude

On Mon, Mar 13, 2023 at 1:28 PM Steve Lawrence <sl...@apache.org> wrote:

> Here's some highish level answers. If you need more details on anything
> let us know.
>
> 1. Yep, we call this feature "layers". You can create a custom layer
> plugin that receives data (as defined by the DFDL schema), your layer
> code transforms (e.g. uncompresses) and outputs that data, and then
> Daffodil parses the outputted data as defined by the DFDL schema.
>
> Here are implementations of the layers included with Daffodil for gzip,
> base64, line folding, and byte swapping:
>
>
> https://github.com/apache/daffodil/tree/main/daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1
>
> And they are pluggable using Java service loaders, e.g.:
>
>
> https://github.com/apache/daffodil/blob/main/daffodil-runtime1-layers/src/main/resources/META-INF/services/org.apache.daffodil.runtime1.layers.LayerCompiler
>
> So you can create the layer outside of Daffodil, create a jar with the
> right services file, put it on the classpath and Daffodil will be able
> to find and use it.
>
> And here is the design proposal of the feature with more details and
> links to related design pages:
>
>
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Dynamically+loading+Layer+Transformations
>
>
> 2. I don't think we have any documentation, but we have a number of
> examples how to define custom charsets. For example, here's a fairly
> small IBM037 charset that we include in Daffodil which is just a lookup
> table:
>
>
> https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/scala/org/apache/daffodil/io/processors/charset/IBM037.scala
>
> You essentially just need to implement BitsCharsetDefinition which
> returns a "BitsCharset" that can creae a BitsCharsetEncoder/Decoder.
> Depending on the complexity of your charset, you maybe be able to use
> existing base classes (e.g. BitsCharseJava) that do a lot of the heavy
> lifting.
>
> Note that these are also loaded using Java service loaders, e.g.:
>
>
> https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/resources/META-INF/services/org.apache.daffodil.io.processors.charset.BitsCharsetDefinition
>
>
> 3. Not at the moment. If you wanted only a subset of fields, you would
> need to post process the fields and extract what parts you need
> yourself. Languages like XSLT/XQuery could probably do this without too
> much effort.
>
> Another alternative would be to create a custom InfosetOutputter that
> would ignore infoset events that you don't care about and keep those you
> do. You could use your own logic for how you determine which fields are
> important, or you could also use dfdlx:runtimeProperties to annotate the
> schema and have your custom InfosetOutputter use those. Here's the
> design information on runtime properties:
>
>
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties
>
> Here's a small example of a custom InfosetOutputter we use for testing,
> which just captures all events and stores them in a list. You could
> imagine doing some sort of filtering and only capture the fields you
> want and ouputting to a custom data structure instead of XML, for example.
>
>
> https://github.com/apache/daffodil/blob/main/daffodil-japi/src/test/java/org/apache/daffodil/example/TestInfosetOutputter.java
>
>
> 4. I haven't personally done a lot of DFDL schema generation, though I
> know other Daffodil devs have, they may be able to chime in on helpful
> tips. But I don't think it's anything unique really. I think mostly what
> they do is get a machine readable specification of the data format, load
> that into some model and then iterate over the model and output strings
> to file. We're very familiar with Scala so we tend to write DFDL schema
> generators in that, which is also nice since it has language support for
> XML. So XML templates are sort of built into the language. But any
> template language would probably work fine.
>
> - Steve
>
>
>
> On 2023-03-13 06:36 AM, Roded Bahat wrote:
> > Hi all,
> > I'm looking into integrating Apache Daffodil into our product and have
> > several questions for which I could not find answers in the
> > documentation or issues.
> >
> > 1. Is it currently possible to extend Daffodil with custom types? For
> > example, could I create a custom field type for a field compressed with
> > a custom compression and have Daffodil call my own code for further
> > parsing of the original field value?
> > 2. The DFDL spec states that additional implementation-defined encoding
> > names can be defined. How would a custom encoding be defined in the DFDL
> > specification?
> > 3. Is it currently possible to parse a input stream but output only a
> > set of field from the specification? For example, could an XPath be
> > specified to determine which nodes in the specification Daffodil will
> > output?
> > 4. Is there a recommended way of dynamically creating a DFDL
> > specification XSD? or should I just use general tooling?
> >
> > Any pointers and help would be much appreciated.
> > Thanks!
> >
> > Roded
>
>

Re: Questions from a Daffodil newbie

Posted by Roded Bahat <ro...@model9.io>.
Many thanks for the detailed answers, these really help a lot.
I'll start experimenting with Apache Daffodil and ask follow up
questions in new specific threads.
Much obliged.

On Mon, 2023-03-13 at 08:28 -0400, Steve Lawrence wrote:
> Here's some highish level answers. If you need more details on
> anything 
> let us know.
> 
> 1. Yep, we call this feature "layers". You can create a custom layer 
> plugin that receives data (as defined by the DFDL schema), your layer
> code transforms (e.g. uncompresses) and outputs that data, and then 
> Daffodil parses the outputted data as defined by the DFDL schema.
> 
> Here are implementations of the layers included with Daffodil for
> gzip, 
> base64, line folding, and byte swapping:
> 
> https://github.com/apache/daffodil/tree/main/daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1
> 
> And they are pluggable using Java service loaders, e.g.:
> 
> https://github.com/apache/daffodil/blob/main/daffodil-runtime1-layers/src/main/resources/META-INF/services/org.apache.daffodil.runtime1.layers.LayerCompiler
> 
> So you can create the layer outside of Daffodil, create a jar with
> the 
> right services file, put it on the classpath and Daffodil will be
> able 
> to find and use it.
> 
> And here is the design proposal of the feature with more details and 
> links to related design pages:
> 
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Dynamically+loading+Layer+Transformations
> 
> 
> 2. I don't think we have any documentation, but we have a number of 
> examples how to define custom charsets. For example, here's a fairly 
> small IBM037 charset that we include in Daffodil which is just a
> lookup 
> table:
> 
> https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/scala/org/apache/daffodil/io/processors/charset/IBM037.scala
> 
> You essentially just need to implement BitsCharsetDefinition which 
> returns a "BitsCharset" that can creae a BitsCharsetEncoder/Decoder. 
> Depending on the complexity of your charset, you maybe be able to use
> existing base classes (e.g. BitsCharseJava) that do a lot of the
> heavy 
> lifting.
> 
> Note that these are also loaded using Java service loaders, e.g.:
> 
> https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/resources/META-INF/services/org.apache.daffodil.io.processors.charset.BitsCharsetDefinition
> 
> 
> 3. Not at the moment. If you wanted only a subset of fields, you
> would 
> need to post process the fields and extract what parts you need 
> yourself. Languages like XSLT/XQuery could probably do this without
> too 
> much effort.
> 
> Another alternative would be to create a custom InfosetOutputter that
> would ignore infoset events that you don't care about and keep those
> you 
> do. You could use your own logic for how you determine which fields
> are 
> important, or you could also use dfdlx:runtimeProperties to annotate
> the 
> schema and have your custom InfosetOutputter use those. Here's the 
> design information on runtime properties:
> 
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties
> 
> Here's a small example of a custom InfosetOutputter we use for
> testing, 
> which just captures all events and stores them in a list. You could 
> imagine doing some sort of filtering and only capture the fields you 
> want and ouputting to a custom data structure instead of XML, for
> example.
> 
> https://github.com/apache/daffodil/blob/main/daffodil-japi/src/test/java/org/apache/daffodil/example/TestInfosetOutputter.java
> 
> 
> 4. I haven't personally done a lot of DFDL schema generation, though
> I 
> know other Daffodil devs have, they may be able to chime in on
> helpful 
> tips. But I don't think it's anything unique really. I think mostly
> what 
> they do is get a machine readable specification of the data format,
> load 
> that into some model and then iterate over the model and output
> strings 
> to file. We're very familiar with Scala so we tend to write DFDL
> schema 
> generators in that, which is also nice since it has language support
> for 
> XML. So XML templates are sort of built into the language. But any 
> template language would probably work fine.
> 
> - Steve
> 
> 
> 
> On 2023-03-13 06:36 AM, Roded Bahat wrote:
> > Hi all,
> > I'm looking into integrating Apache Daffodil into our product and
> > have 
> > several questions for which I could not find answers in the 
> > documentation or issues.
> > 
> > 1. Is it currently possible to extend Daffodil with custom types?
> > For 
> > example, could I create a custom field type for a field compressed
> > with 
> > a custom compression and have Daffodil call my own code for further
> > parsing of the original field value?
> > 2. The DFDL spec states that additional implementation-defined
> > encoding 
> > names can be defined. How would a custom encoding be defined in the
> > DFDL 
> > specification?
> > 3. Is it currently possible to parse a input stream but output only
> > a 
> > set of field from the specification? For example, could an XPath be
> > specified to determine which nodes in the specification Daffodil
> > will 
> > output?
> > 4. Is there a recommended way of dynamically creating a DFDL 
> > specification XSD? or should I just use general tooling?
> > 
> > Any pointers and help would be much appreciated.
> > Thanks!
> > 
> > Roded
> 


Re: Questions from a Daffodil newbie

Posted by Steve Lawrence <sl...@apache.org>.
Here's some highish level answers. If you need more details on anything 
let us know.

1. Yep, we call this feature "layers". You can create a custom layer 
plugin that receives data (as defined by the DFDL schema), your layer 
code transforms (e.g. uncompresses) and outputs that data, and then 
Daffodil parses the outputted data as defined by the DFDL schema.

Here are implementations of the layers included with Daffodil for gzip, 
base64, line folding, and byte swapping:

https://github.com/apache/daffodil/tree/main/daffodil-runtime1-layers/src/main/scala/org/apache/daffodil/layers/runtime1

And they are pluggable using Java service loaders, e.g.:

https://github.com/apache/daffodil/blob/main/daffodil-runtime1-layers/src/main/resources/META-INF/services/org.apache.daffodil.runtime1.layers.LayerCompiler

So you can create the layer outside of Daffodil, create a jar with the 
right services file, put it on the classpath and Daffodil will be able 
to find and use it.

And here is the design proposal of the feature with more details and 
links to related design pages:

https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Dynamically+loading+Layer+Transformations


2. I don't think we have any documentation, but we have a number of 
examples how to define custom charsets. For example, here's a fairly 
small IBM037 charset that we include in Daffodil which is just a lookup 
table:

https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/scala/org/apache/daffodil/io/processors/charset/IBM037.scala

You essentially just need to implement BitsCharsetDefinition which 
returns a "BitsCharset" that can creae a BitsCharsetEncoder/Decoder. 
Depending on the complexity of your charset, you maybe be able to use 
existing base classes (e.g. BitsCharseJava) that do a lot of the heavy 
lifting.

Note that these are also loaded using Java service loaders, e.g.:

https://github.com/apache/daffodil/blob/main/daffodil-io/src/main/resources/META-INF/services/org.apache.daffodil.io.processors.charset.BitsCharsetDefinition


3. Not at the moment. If you wanted only a subset of fields, you would 
need to post process the fields and extract what parts you need 
yourself. Languages like XSLT/XQuery could probably do this without too 
much effort.

Another alternative would be to create a custom InfosetOutputter that 
would ignore infoset events that you don't care about and keep those you 
do. You could use your own logic for how you determine which fields are 
important, or you could also use dfdlx:runtimeProperties to annotate the 
schema and have your custom InfosetOutputter use those. Here's the 
design information on runtime properties:

https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Runtime+Properties

Here's a small example of a custom InfosetOutputter we use for testing, 
which just captures all events and stores them in a list. You could 
imagine doing some sort of filtering and only capture the fields you 
want and ouputting to a custom data structure instead of XML, for example.

https://github.com/apache/daffodil/blob/main/daffodil-japi/src/test/java/org/apache/daffodil/example/TestInfosetOutputter.java


4. I haven't personally done a lot of DFDL schema generation, though I 
know other Daffodil devs have, they may be able to chime in on helpful 
tips. But I don't think it's anything unique really. I think mostly what 
they do is get a machine readable specification of the data format, load 
that into some model and then iterate over the model and output strings 
to file. We're very familiar with Scala so we tend to write DFDL schema 
generators in that, which is also nice since it has language support for 
XML. So XML templates are sort of built into the language. But any 
template language would probably work fine.

- Steve



On 2023-03-13 06:36 AM, Roded Bahat wrote:
> Hi all,
> I'm looking into integrating Apache Daffodil into our product and have 
> several questions for which I could not find answers in the 
> documentation or issues.
> 
> 1. Is it currently possible to extend Daffodil with custom types? For 
> example, could I create a custom field type for a field compressed with 
> a custom compression and have Daffodil call my own code for further 
> parsing of the original field value?
> 2. The DFDL spec states that additional implementation-defined encoding 
> names can be defined. How would a custom encoding be defined in the DFDL 
> specification?
> 3. Is it currently possible to parse a input stream but output only a 
> set of field from the specification? For example, could an XPath be 
> specified to determine which nodes in the specification Daffodil will 
> output?
> 4. Is there a recommended way of dynamically creating a DFDL 
> specification XSD? or should I just use general tooling?
> 
> Any pointers and help would be much appreciated.
> Thanks!
> 
> Roded