You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by Christofer Dutz <ch...@c-ware.de> on 2019/01/22 09:16:11 UTC

How to achieve lengthKind=”endOfParent” without using endOfParent?

Hi all,

I am stuck with a little problem … I am reading a packet, which is usually contained inside another. Therefore it doesn’t provide any means of providing it’s length.
So the packet is just a small header + binary data … now I want to read “all the rest” after the header into a field “userData”.
In the DFDL documentation at IBM I could read that the lengthKind=”endOfParent” would be what I’m looking for.

Unfortunately this doesn’t seem to be supported … so how can I achieve the same with implemented options?

Chris

Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Posted by Christofer Dutz <ch...@c-ware.de>.
Thinking of it a bit more:
I initially wanted to model each protocol layer separately in order to be able to re-use layers. But till now I haven't encountered a single industry protocol utilizing the same protocol twice.

So I'm thinking of joining the three parts (or at least the lower two) into one format definition. Then I have the total length and at least that problem is solved. If Daffodil then starts supporting that feature, I might split it again.

Chris

Outlook für Android<https://aka.ms/ghei36> herunterladen

________________________________
From: Christofer Dutz
Sent: Tuesday, January 22, 2019 5:58:33 PM
To: dev@daffodil.apache.org; Steve Lawrence
Subject: Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Hi Steve,

The code is in the plc4x repo I posted several times now. Unfortunately I'm sitting in a train without my laptop. It's the COTP protocol. There's a matching tdml test with commented out binary payload. That's what I'm trying to read.

Could probably post the links some time this evening.

Chris

Outlook für Android<https://aka.ms/ghei36> herunterladen

________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Tuesday, January 22, 2019 5:17:24 PM
To: dev@daffodil.apache.org; Christofer Dutz
Subject: Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

There isn't a concept of a global length of input since some inputs
could be streaming and so we don't actually know the length until the
end of data is reached.

I guess it isn't clear to me what your data looks like. I /think/
delimited hexBinary should work. If the parent element does not have a
length, delimited hex binary should consume all available data up until
the end. Could you provide a little more detail on what your data looks
like (e.g. what has a known lengths, headers, user data, etc.)

As far as implementing lengthKind="prefixed", I don't think the current
Daffodil devs have the resources to implement endOfParent right now.
Most of us are focused on other tasks at the moment. Tough, it's
definitely possible to implement it--there aren't any real technical
limitations that I know of with the current code base--but it probably
would be a decent amount of work and would be an ambitious tasks for a
first time Daffodil contributor. Such a feature touches a lot of
different parts of Daffodil so there's a lot to learn. We're more than
happy to provide guidance if you do want to contribute this feature, and
it probably could be done in reasonably sized chunks, but I'd first want
to confirm that there isn't an alternative.

- Steve


On 1/22/19 10:35 AM, Christofer Dutz wrote:
> Hi Steve,
>
> well the problem is that I don't have the parent length in the current context.
>
> Without it, it doesn't seem to work.
>
> If there was some sort of global variable providing the total length of the entire input, that would be awesome.
> As I mentioned, the length information in in the surrounding protocol, I wanted to model them all as separate as possible.
>
> Would it be possible to implement lengthKind="endOfParent"? Would it be a lot of work? Could I help with it?
>
> Chris
>
>
>
> Am 22.01.19, 15:48 schrieb "Steve Lawrence" <sl...@apache.org>:
>
>     Correct, lengthKind="endOfParent" has not bee implemented yet.
>
>     As an alternative that we do support, you should be able to use
>     dfdl:lengthKind="delimited" for the hexBinary user data. In this case,
>     there's no delimiter, but parent length sort of acts like one. For example:
>
>       <xs:element name="Parent"
>         dfdl:lengthKind="explicit" dfdl:length="4"
>         dfdl:lengthUnits="bytes">
>         <xs:complexType>
>           <xs:sequence>
>             <xs:element name="Header" type="xs:hexBinary"
>               dfdl:lengthKind="explicit" dfdl:length="1"
>               dfdl:lengthUnits="bytes" />
>             <xs:element name="UserData" type="xs:hexBinary"
>               dfdl:lengthKind="delimited" dfdl:encoding="ISO-8859-1"/>
>           </xs:sequence>
>         </xs:complexType>
>       </xs:element>
>
>     So the parent element is 4 bytes and the header is 1 byte. If we parse
>     the data:
>
>       0xAA BB CC DD
>
>     We get the following infoset
>
>       <Parent>
>         <Header>AA</Header>
>         <UserData>BBCCDD</UserData>
>       </Parent>
>
>     And the UserData is the remaining three bytes. Using
>     lengthKind="endOfParent" would probably have better performance if we
>     implemented it, but this should give the same result for the hexBinary
>     blob at the end.
>
>     - Steve
>
>
>     On 1/22/19 4:16 AM, Christofer Dutz wrote:
>     > Hi all,
>     >
>     > I am stuck with a little problem … I am reading a packet, which is usually contained inside another. Therefore it doesn’t provide any means of providing it’s length.
>     > So the packet is just a small header + binary data … now I want to read “all the rest” after the header into a field “userData”.
>     > In the DFDL documentation at IBM I could read that the lengthKind=”endOfParent” would be what I’m looking for.
>     >
>     > Unfortunately this doesn’t seem to be supported … so how can I achieve the same with implemented options?
>     >
>     > Chris
>     >
>
>
>


Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Posted by Christofer Dutz <ch...@c-ware.de>.
Will try that first, first thing tomorrow.
However I could swear that I had it exactly that way and it didn't work ...

Thanks for your help.

Chris

Outlook für Android<https://aka.ms/ghei36> herunterladen

________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Tuesday, January 22, 2019 6:44:38 PM
To: Christofer Dutz; dev@daffodil.apache.org
Subject: Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Yep, I think hexBinay with dfdl:lengthKind="delimited" should work for
your case. I've modified the userData element to look like this:

  <xs:element name="userData" type="xs:hexBinary"
    dfdl:byteOrder="bigEndian" dfdl:lengthKind="delimited"
    dfdl:encoding="ISO-8859-1" dfdl:textTrimKind="none" />

This will cause the userData field to consume all data until the end of
the input. Note that delimited hexBinary is treated like string data, so
the encoding and textTrimKind properties need to be specified--it might
make sense to move them to the cotpFormat.

I'm guessing the test you're talking about is "scenarioDataTpdu". With
the above change to the schema and using the data from that test:

  02F080320700000300000800080001120411440100ff09000401320004

The resulting infoset is:

  <cotp:CoTpTPDU xmlns:cotp="http://plc4x.apache.org/cotp">
    <headerLength>2</headerLength>
    <type>240</type>
    <cotp:CotpTpduData>
      <endOfTransmission>1</endOfTransmission>
      <tpduRef>0</tpduRef>
    </cotp:CotpTpduData>
<userData>320700000300000800080001120411440100FF09000401320004</userData>
  </cotp:CoTpTPDU>

Three bytes total are consumed for the headerLength, type, and
CotPTpduData field, and the remaining bytes end up in the userData field
as hexBinary. If there is no remaining data in the input, then the
<userData> element is just empty (i.e. <userData />).

- Steve



On 1/22/19 11:58 AM, Christofer Dutz wrote:
> Hi Steve,
>
> The code is in the plc4x repo I posted several times now. Unfortunately I'm
> sitting in a train without my laptop. It's the COTP protocol. There's a matching
> tdml test with commented out binary payload. That's what I'm trying to read.
>
> Could probably post the links some time this evening.
>
> Chris
>
> Outlook für Android <https://aka.ms/ghei36> herunterladen
>
> --------------------------------------------------------------------------------
> *From:* Steve Lawrence <sl...@apache.org>
> *Sent:* Tuesday, January 22, 2019 5:17:24 PM
> *To:* dev@daffodil.apache.org; Christofer Dutz
> *Subject:* Re: How to achieve lengthKind=”endOfParent” without using endOfParent?
> There isn't a concept of a global length of input since some inputs
> could be streaming and so we don't actually know the length until the
> end of data is reached.
>
> I guess it isn't clear to me what your data looks like. I /think/
> delimited hexBinary should work. If the parent element does not have a
> length, delimited hex binary should consume all available data up until
> the end. Could you provide a little more detail on what your data looks
> like (e.g. what has a known lengths, headers, user data, etc.)
>
> As far as implementing lengthKind="prefixed", I don't think the current
> Daffodil devs have the resources to implement endOfParent right now.
> Most of us are focused on other tasks at the moment. Tough, it's
> definitely possible to implement it--there aren't any real technical
> limitations that I know of with the current code base--but it probably
> would be a decent amount of work and would be an ambitious tasks for a
> first time Daffodil contributor. Such a feature touches a lot of
> different parts of Daffodil so there's a lot to learn. We're more than
> happy to provide guidance if you do want to contribute this feature, and
> it probably could be done in reasonably sized chunks, but I'd first want
> to confirm that there isn't an alternative.
>
> - Steve
>
>
> On 1/22/19 10:35 AM, Christofer Dutz wrote:
>> Hi Steve,
>>
>> well the problem is that I don't have the parent length in the current context.
>>
>> Without it, it doesn't seem to work.
>>
>> If there was some sort of global variable providing the total length of the entire input, that would be awesome.
>> As I mentioned, the length information in in the surrounding protocol, I wanted to model them all as separate as possible.
>>
>> Would it be possible to implement lengthKind="endOfParent"? Would it be a lot of work? Could I help with it?
>>
>> Chris
>>
>>
>>
>> Am 22.01.19, 15:48 schrieb "Steve Lawrence" <sl...@apache.org>:
>>
>>     Correct, lengthKind="endOfParent" has not bee implemented yet.
>>
>>     As an alternative that we do support, you should be able to use
>>     dfdl:lengthKind="delimited" for the hexBinary user data. In this case,
>>     there's no delimiter, but parent length sort of acts like one. For example:
>>
>>       <xs:element name="Parent"
>>         dfdl:lengthKind="explicit" dfdl:length="4"
>>         dfdl:lengthUnits="bytes">
>>         <xs:complexType>
>>           <xs:sequence>
>>             <xs:element name="Header" type="xs:hexBinary"
>>               dfdl:lengthKind="explicit" dfdl:length="1"
>>               dfdl:lengthUnits="bytes" />
>>             <xs:element name="UserData" type="xs:hexBinary"
>>               dfdl:lengthKind="delimited" dfdl:encoding="ISO-8859-1"/>
>>           </xs:sequence>
>>         </xs:complexType>
>>       </xs:element>
>>
>>     So the parent element is 4 bytes and the header is 1 byte. If we parse
>>     the data:
>>
>>       0xAA BB CC DD
>>
>>     We get the following infoset
>>
>>       <Parent>
>>         <Header>AA</Header>
>>         <UserData>BBCCDD</UserData>
>>       </Parent>
>>
>>     And the UserData is the remaining three bytes. Using
>>     lengthKind="endOfParent" would probably have better performance if we
>>     implemented it, but this should give the same result for the hexBinary
>>     blob at the end.
>>
>>     - Steve
>>
>>
>>     On 1/22/19 4:16 AM, Christofer Dutz wrote:
>>     > Hi all,
>>     >
>>     > I am stuck with a little problem … I am reading a packet, which is usually contained inside another. Therefore it doesn’t provide any means of providing it’s length.
>>     > So the packet is just a small header + binary data … now I want to read “all the rest” after the header into a field “userData”.
>>     > In the DFDL documentation at IBM I could read that the lengthKind=”endOfParent” would be what I’m looking for.
>>     >
>>     > Unfortunately this doesn’t seem to be supported … so how can I achieve the same with implemented options?
>>     >
>>     > Chris
>>     >
>>
>>
>>
>


Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Posted by Steve Lawrence <sl...@apache.org>.
The behavior of the  test suite depends on the defaultRoundTrip and
roundTrip attributes in the tdml:testSuite and tdml:parserTestCase
elements. The different values and how they affect round trip testing
are described at the end of our TDML page [1].

What you describe we call roundTrip="onePass", which is the same as
"true". It looks like you have defaultRoundTrip="true", so each TDML
test is parsed and compared with the expected infoset. If they match,
the infoset is "unparsed" (serialized) and compared with the original
input data. If either the infoset or unparsed data does not exactly
match, the TDML runner should cause a test failure.

It's exciting to see the progress you're making! Your schemas look
really well written--for sure a good example of how to write DFDL
schemas. I'm definitely looking forward to seeing things progress with
the PLC4X project.

- Steve

[1] https://daffodil.apache.org/tdml/#round-trip-testing

On 1/25/19 11:26 AM, Christofer Dutz wrote:
> Hi Steve,
> 
> was a busy two days for me ... but now I got to come back to the fun stuff.
> 
> So I guess now I was able to finish both the s7 schema as well as the test-suite.
> I added the byte data of several packet captures and the parsing seems to be doing its job nicely.
> 
> It even helped diagnose a bug in our code by being able to adjust the format to another assumption and checking if it worked.
> 
> And thanks for your patience and continued assistance with this. But I think this is going to be a huge thing for PLC4X :-)
> 
> Regarding the performance question ... I got the numbers form the test-suite execution ... here every parsing operation is done exactly once.
> So guess it's not quite representative.
> 
> One question however:
> How does a test in the testsuite work ... does it take the binary input, parse that and compare it with the XML version and then take the XML version and serialize it and compare to the byte version?
> Cause initially I got errors while parsing and later I once got an error "Nope" when "unparsing" (Guess that's Serializing) ... would be great to know if it does that as this way I would feel much more 
> Confident it's doing 100% what I want.
> 
> My next step would be to generate a new version of the S7 driver, that utilizes Daffodil for the serialization and deserialization ... then I'll probably do some benchmarks and compare to the hand-written code.
> 
> Nevertheless I think this will be a great way to implement new protocols as it's simply a lot faster to write such a schema (if you know how to do that).
> 
> Thanks again to you all,
> Chris
> 
> 
> 
> Am 22.01.19, 23:02 schrieb "Steve Lawrence" <sl...@apache.org>:
> 
>     If merged schemas allow you to access other fields to calculate the
>     length of the userData field instead of using delimited hexBinary, I
>     suspect you would see a noticeable performance increase.
>     
>     Delimited hexBinary is implemented as encoding the input bytes into
>     ISO-8859-1 characters and building up a string until a delimiter or end
>     of data is found. The resulting string is then decoded to get the hex
>     binary byte array. It's not terribly slow, but is inefficient compared
>     to how we normally get hexBinary bytes with an explicit length. In the
>     explicit length case, we know exactly how many bits to read and can read
>     the source bytes directly into a hexBinary array, avoiding all the
>     encoding/decoding/delimiter scanning complexity.
>     
>     - Steve
>     
>     On 1/22/19 3:48 PM, Christofer Dutz wrote:
>     > Hi Steve
>     > 
>     > Yup ... couldn't wait till tomorrow and yes ... 
>     > your option worked (Wonder what I had different)
>     > 
>     > Performance-wise ... would it be better to join the schemas?
>     > 
>     > As I will always parse all 3 schemas and use them for serialization.
>     > I could imagine a merged schema (where I can for example get the 
>     > length for COTP from the KPKT and use that for the userData)
>     > 
>     > Chris
>     > 
>     > 
>     > Am 22.01.19, 18:44 schrieb "Steve Lawrence" <sl...@apache.org>:
>     > 
>     >     Yep, I think hexBinay with dfdl:lengthKind="delimited" should work for
>     >     your case. I've modified the userData element to look like this:
>     >     
>     >       <xs:element name="userData" type="xs:hexBinary"
>     >         dfdl:byteOrder="bigEndian" dfdl:lengthKind="delimited"
>     >         dfdl:encoding="ISO-8859-1" dfdl:textTrimKind="none" />
>     >     
>     >     This will cause the userData field to consume all data until the end of
>     >     the input. Note that delimited hexBinary is treated like string data, so
>     >     the encoding and textTrimKind properties need to be specified--it might
>     >     make sense to move them to the cotpFormat.
>     >     
>     >     I'm guessing the test you're talking about is "scenarioDataTpdu". With
>     >     the above change to the schema and using the data from that test:
>     >     
>     >       02F080320700000300000800080001120411440100ff09000401320004
>     >     
>     >     The resulting infoset is:
>     >     
>     >       <cotp:CoTpTPDU xmlns:cotp="http://plc4x.apache.org/cotp">
>     >         <headerLength>2</headerLength>
>     >         <type>240</type>
>     >         <cotp:CotpTpduData>
>     >           <endOfTransmission>1</endOfTransmission>
>     >           <tpduRef>0</tpduRef>
>     >         </cotp:CotpTpduData>
>     >     <userData>320700000300000800080001120411440100FF09000401320004</userData>
>     >       </cotp:CoTpTPDU>
>     >     
>     >     Three bytes total are consumed for the headerLength, type, and
>     >     CotPTpduData field, and the remaining bytes end up in the userData field
>     >     as hexBinary. If there is no remaining data in the input, then the
>     >     <userData> element is just empty (i.e. <userData />).
>     >     
>     >     - Steve
>     >     
>     >     
>     >     
>     >     On 1/22/19 11:58 AM, Christofer Dutz wrote:
>     >     > Hi Steve,
>     >     > 
>     >     > The code is in the plc4x repo I posted several times now. Unfortunately I'm 
>     >     > sitting in a train without my laptop. It's the COTP protocol. There's a matching 
>     >     > tdml test with commented out binary payload. That's what I'm trying to read.
>     >     > 
>     >     > Could probably post the links some time this evening.
>     >     > 
>     >     > Chris
>     >     > 
>     >     > Outlook für Android <https://aka.ms/ghei36> herunterladen
>     >     > 
>     >     > --------------------------------------------------------------------------------
>     >     > *From:* Steve Lawrence <sl...@apache.org>
>     >     > *Sent:* Tuesday, January 22, 2019 5:17:24 PM
>     >     > *To:* dev@daffodil.apache.org; Christofer Dutz
>     >     > *Subject:* Re: How to achieve lengthKind=”endOfParent” without using endOfParent?
>     >     > There isn't a concept of a global length of input since some inputs
>     >     > could be streaming and so we don't actually know the length until the
>     >     > end of data is reached.
>     >     > 
>     >     > I guess it isn't clear to me what your data looks like. I /think/
>     >     > delimited hexBinary should work. If the parent element does not have a
>     >     > length, delimited hex binary should consume all available data up until
>     >     > the end. Could you provide a little more detail on what your data looks
>     >     > like (e.g. what has a known lengths, headers, user data, etc.)
>     >     > 
>     >     > As far as implementing lengthKind="prefixed", I don't think the current
>     >     > Daffodil devs have the resources to implement endOfParent right now.
>     >     > Most of us are focused on other tasks at the moment. Tough, it's
>     >     > definitely possible to implement it--there aren't any real technical
>     >     > limitations that I know of with the current code base--but it probably
>     >     > would be a decent amount of work and would be an ambitious tasks for a
>     >     > first time Daffodil contributor. Such a feature touches a lot of
>     >     > different parts of Daffodil so there's a lot to learn. We're more than
>     >     > happy to provide guidance if you do want to contribute this feature, and
>     >     > it probably could be done in reasonably sized chunks, but I'd first want
>     >     > to confirm that there isn't an alternative.
>     >     > 
>     >     > - Steve
>     >     > 
>     >     > 
>     >     > On 1/22/19 10:35 AM, Christofer Dutz wrote:
>     >     >> Hi Steve,
>     >     >> 
>     >     >> well the problem is that I don't have the parent length in the current context.
>     >     >> 
>     >     >> Without it, it doesn't seem to work.
>     >     >> 
>     >     >> If there was some sort of global variable providing the total length of the entire input, that would be awesome.
>     >     >> As I mentioned, the length information in in the surrounding protocol, I wanted to model them all as separate as possible.
>     >     >> 
>     >     >> Would it be possible to implement lengthKind="endOfParent"? Would it be a lot of work? Could I help with it?
>     >     >> 
>     >     >> Chris
>     >     >> 
>     >     >> 
>     >     >> 
>     >     >> Am 22.01.19, 15:48 schrieb "Steve Lawrence" <sl...@apache.org>:
>     >     >> 
>     >     >>     Correct, lengthKind="endOfParent" has not bee implemented yet.
>     >     >>     
>     >     >>     As an alternative that we do support, you should be able to use
>     >     >>     dfdl:lengthKind="delimited" for the hexBinary user data. In this case,
>     >     >>     there's no delimiter, but parent length sort of acts like one. For example:
>     >     >>     
>     >     >>       <xs:element name="Parent"
>     >     >>         dfdl:lengthKind="explicit" dfdl:length="4"
>     >     >>         dfdl:lengthUnits="bytes">
>     >     >>         <xs:complexType>
>     >     >>           <xs:sequence>
>     >     >>             <xs:element name="Header" type="xs:hexBinary"
>     >     >>               dfdl:lengthKind="explicit" dfdl:length="1"
>     >     >>               dfdl:lengthUnits="bytes" />
>     >     >>             <xs:element name="UserData" type="xs:hexBinary"
>     >     >>               dfdl:lengthKind="delimited" dfdl:encoding="ISO-8859-1"/>
>     >     >>           </xs:sequence>
>     >     >>         </xs:complexType>
>     >     >>       </xs:element>
>     >     >>     
>     >     >>     So the parent element is 4 bytes and the header is 1 byte. If we parse
>     >     >>     the data:
>     >     >>     
>     >     >>       0xAA BB CC DD
>     >     >>     
>     >     >>     We get the following infoset
>     >     >>     
>     >     >>       <Parent>
>     >     >>         <Header>AA</Header>
>     >     >>         <UserData>BBCCDD</UserData>
>     >     >>       </Parent>
>     >     >>     
>     >     >>     And the UserData is the remaining three bytes. Using
>     >     >>     lengthKind="endOfParent" would probably have better performance if we
>     >     >>     implemented it, but this should give the same result for the hexBinary
>     >     >>     blob at the end.
>     >     >>     
>     >     >>     - Steve
>     >     >>     
>     >     >>     
>     >     >>     On 1/22/19 4:16 AM, Christofer Dutz wrote:
>     >     >>     > Hi all,
>     >     >>     > 
>     >     >>     > I am stuck with a little problem … I am reading a packet, which is usually contained inside another. Therefore it doesn’t provide any means of providing it’s length.
>     >     >>     > So the packet is just a small header + binary data … now I want to read “all the rest” after the header into a field “userData”.
>     >     >>     > In the DFDL documentation at IBM I could read that the lengthKind=”endOfParent” would be what I’m looking for.
>     >     >>     > 
>     >     >>     > Unfortunately this doesn’t seem to be supported … so how can I achieve the same with implemented options?
>     >     >>     > 
>     >     >>     > Chris
>     >     >>     > 
>     >     >>     
>     >     >>     
>     >     >> 
>     >     > 
>     >     
>     >     
>     > 
>     
>     
> 


Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi Steve,

was a busy two days for me ... but now I got to come back to the fun stuff.

So I guess now I was able to finish both the s7 schema as well as the test-suite.
I added the byte data of several packet captures and the parsing seems to be doing its job nicely.

It even helped diagnose a bug in our code by being able to adjust the format to another assumption and checking if it worked.

And thanks for your patience and continued assistance with this. But I think this is going to be a huge thing for PLC4X :-)

Regarding the performance question ... I got the numbers form the test-suite execution ... here every parsing operation is done exactly once.
So guess it's not quite representative.

One question however:
How does a test in the testsuite work ... does it take the binary input, parse that and compare it with the XML version and then take the XML version and serialize it and compare to the byte version?
Cause initially I got errors while parsing and later I once got an error "Nope" when "unparsing" (Guess that's Serializing) ... would be great to know if it does that as this way I would feel much more 
Confident it's doing 100% what I want.

My next step would be to generate a new version of the S7 driver, that utilizes Daffodil for the serialization and deserialization ... then I'll probably do some benchmarks and compare to the hand-written code.

Nevertheless I think this will be a great way to implement new protocols as it's simply a lot faster to write such a schema (if you know how to do that).

Thanks again to you all,
Chris



Am 22.01.19, 23:02 schrieb "Steve Lawrence" <sl...@apache.org>:

    If merged schemas allow you to access other fields to calculate the
    length of the userData field instead of using delimited hexBinary, I
    suspect you would see a noticeable performance increase.
    
    Delimited hexBinary is implemented as encoding the input bytes into
    ISO-8859-1 characters and building up a string until a delimiter or end
    of data is found. The resulting string is then decoded to get the hex
    binary byte array. It's not terribly slow, but is inefficient compared
    to how we normally get hexBinary bytes with an explicit length. In the
    explicit length case, we know exactly how many bits to read and can read
    the source bytes directly into a hexBinary array, avoiding all the
    encoding/decoding/delimiter scanning complexity.
    
    - Steve
    
    On 1/22/19 3:48 PM, Christofer Dutz wrote:
    > Hi Steve
    > 
    > Yup ... couldn't wait till tomorrow and yes ... 
    > your option worked (Wonder what I had different)
    > 
    > Performance-wise ... would it be better to join the schemas?
    > 
    > As I will always parse all 3 schemas and use them for serialization.
    > I could imagine a merged schema (where I can for example get the 
    > length for COTP from the KPKT and use that for the userData)
    > 
    > Chris
    > 
    > 
    > Am 22.01.19, 18:44 schrieb "Steve Lawrence" <sl...@apache.org>:
    > 
    >     Yep, I think hexBinay with dfdl:lengthKind="delimited" should work for
    >     your case. I've modified the userData element to look like this:
    >     
    >       <xs:element name="userData" type="xs:hexBinary"
    >         dfdl:byteOrder="bigEndian" dfdl:lengthKind="delimited"
    >         dfdl:encoding="ISO-8859-1" dfdl:textTrimKind="none" />
    >     
    >     This will cause the userData field to consume all data until the end of
    >     the input. Note that delimited hexBinary is treated like string data, so
    >     the encoding and textTrimKind properties need to be specified--it might
    >     make sense to move them to the cotpFormat.
    >     
    >     I'm guessing the test you're talking about is "scenarioDataTpdu". With
    >     the above change to the schema and using the data from that test:
    >     
    >       02F080320700000300000800080001120411440100ff09000401320004
    >     
    >     The resulting infoset is:
    >     
    >       <cotp:CoTpTPDU xmlns:cotp="http://plc4x.apache.org/cotp">
    >         <headerLength>2</headerLength>
    >         <type>240</type>
    >         <cotp:CotpTpduData>
    >           <endOfTransmission>1</endOfTransmission>
    >           <tpduRef>0</tpduRef>
    >         </cotp:CotpTpduData>
    >     <userData>320700000300000800080001120411440100FF09000401320004</userData>
    >       </cotp:CoTpTPDU>
    >     
    >     Three bytes total are consumed for the headerLength, type, and
    >     CotPTpduData field, and the remaining bytes end up in the userData field
    >     as hexBinary. If there is no remaining data in the input, then the
    >     <userData> element is just empty (i.e. <userData />).
    >     
    >     - Steve
    >     
    >     
    >     
    >     On 1/22/19 11:58 AM, Christofer Dutz wrote:
    >     > Hi Steve,
    >     > 
    >     > The code is in the plc4x repo I posted several times now. Unfortunately I'm 
    >     > sitting in a train without my laptop. It's the COTP protocol. There's a matching 
    >     > tdml test with commented out binary payload. That's what I'm trying to read.
    >     > 
    >     > Could probably post the links some time this evening.
    >     > 
    >     > Chris
    >     > 
    >     > Outlook für Android <https://aka.ms/ghei36> herunterladen
    >     > 
    >     > --------------------------------------------------------------------------------
    >     > *From:* Steve Lawrence <sl...@apache.org>
    >     > *Sent:* Tuesday, January 22, 2019 5:17:24 PM
    >     > *To:* dev@daffodil.apache.org; Christofer Dutz
    >     > *Subject:* Re: How to achieve lengthKind=”endOfParent” without using endOfParent?
    >     > There isn't a concept of a global length of input since some inputs
    >     > could be streaming and so we don't actually know the length until the
    >     > end of data is reached.
    >     > 
    >     > I guess it isn't clear to me what your data looks like. I /think/
    >     > delimited hexBinary should work. If the parent element does not have a
    >     > length, delimited hex binary should consume all available data up until
    >     > the end. Could you provide a little more detail on what your data looks
    >     > like (e.g. what has a known lengths, headers, user data, etc.)
    >     > 
    >     > As far as implementing lengthKind="prefixed", I don't think the current
    >     > Daffodil devs have the resources to implement endOfParent right now.
    >     > Most of us are focused on other tasks at the moment. Tough, it's
    >     > definitely possible to implement it--there aren't any real technical
    >     > limitations that I know of with the current code base--but it probably
    >     > would be a decent amount of work and would be an ambitious tasks for a
    >     > first time Daffodil contributor. Such a feature touches a lot of
    >     > different parts of Daffodil so there's a lot to learn. We're more than
    >     > happy to provide guidance if you do want to contribute this feature, and
    >     > it probably could be done in reasonably sized chunks, but I'd first want
    >     > to confirm that there isn't an alternative.
    >     > 
    >     > - Steve
    >     > 
    >     > 
    >     > On 1/22/19 10:35 AM, Christofer Dutz wrote:
    >     >> Hi Steve,
    >     >> 
    >     >> well the problem is that I don't have the parent length in the current context.
    >     >> 
    >     >> Without it, it doesn't seem to work.
    >     >> 
    >     >> If there was some sort of global variable providing the total length of the entire input, that would be awesome.
    >     >> As I mentioned, the length information in in the surrounding protocol, I wanted to model them all as separate as possible.
    >     >> 
    >     >> Would it be possible to implement lengthKind="endOfParent"? Would it be a lot of work? Could I help with it?
    >     >> 
    >     >> Chris
    >     >> 
    >     >> 
    >     >> 
    >     >> Am 22.01.19, 15:48 schrieb "Steve Lawrence" <sl...@apache.org>:
    >     >> 
    >     >>     Correct, lengthKind="endOfParent" has not bee implemented yet.
    >     >>     
    >     >>     As an alternative that we do support, you should be able to use
    >     >>     dfdl:lengthKind="delimited" for the hexBinary user data. In this case,
    >     >>     there's no delimiter, but parent length sort of acts like one. For example:
    >     >>     
    >     >>       <xs:element name="Parent"
    >     >>         dfdl:lengthKind="explicit" dfdl:length="4"
    >     >>         dfdl:lengthUnits="bytes">
    >     >>         <xs:complexType>
    >     >>           <xs:sequence>
    >     >>             <xs:element name="Header" type="xs:hexBinary"
    >     >>               dfdl:lengthKind="explicit" dfdl:length="1"
    >     >>               dfdl:lengthUnits="bytes" />
    >     >>             <xs:element name="UserData" type="xs:hexBinary"
    >     >>               dfdl:lengthKind="delimited" dfdl:encoding="ISO-8859-1"/>
    >     >>           </xs:sequence>
    >     >>         </xs:complexType>
    >     >>       </xs:element>
    >     >>     
    >     >>     So the parent element is 4 bytes and the header is 1 byte. If we parse
    >     >>     the data:
    >     >>     
    >     >>       0xAA BB CC DD
    >     >>     
    >     >>     We get the following infoset
    >     >>     
    >     >>       <Parent>
    >     >>         <Header>AA</Header>
    >     >>         <UserData>BBCCDD</UserData>
    >     >>       </Parent>
    >     >>     
    >     >>     And the UserData is the remaining three bytes. Using
    >     >>     lengthKind="endOfParent" would probably have better performance if we
    >     >>     implemented it, but this should give the same result for the hexBinary
    >     >>     blob at the end.
    >     >>     
    >     >>     - Steve
    >     >>     
    >     >>     
    >     >>     On 1/22/19 4:16 AM, Christofer Dutz wrote:
    >     >>     > Hi all,
    >     >>     > 
    >     >>     > I am stuck with a little problem … I am reading a packet, which is usually contained inside another. Therefore it doesn’t provide any means of providing it’s length.
    >     >>     > So the packet is just a small header + binary data … now I want to read “all the rest” after the header into a field “userData”.
    >     >>     > In the DFDL documentation at IBM I could read that the lengthKind=”endOfParent” would be what I’m looking for.
    >     >>     > 
    >     >>     > Unfortunately this doesn’t seem to be supported … so how can I achieve the same with implemented options?
    >     >>     > 
    >     >>     > Chris
    >     >>     > 
    >     >>     
    >     >>     
    >     >> 
    >     > 
    >     
    >     
    > 
    
    


Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Posted by Steve Lawrence <sl...@apache.org>.
Yep, that's a reasonable way to handle what would otherwise be an empty
choice branch. Zero length numbers will cause a failure as you saw. An
alternative might be to use xs:hexBinary with dfdl:length="0" for the
S7RequestPayloadReadVar element. You aren't allowed to have
intergers/shorts/bytes/etc. with zero lengths, but you are allowed to
have zero length hexBinary data. Then you do not need the minOccurs or
the complex type. Makes things just a little simpler.


As far as performance goes, I'm seeing about what I would expect. To
test, I used the data in the tpktPacketContainingCotpConnectResponse
test in tpkit-protocol.tdml. I tested performance with 300,000
iterations. I also needed to subtract 4 from the occursCount expression
in the CotpMessageType to account for the Tkip header. Not sure if the
data in that test isn't meant to work in S7full, but I'm not sure it
would affect performance since the packet doesn't have any S7 data (I
think).

Parsing this data with just the TPKT schema, parse maxes out at about
179000 parses/second, or 1.676 seconds.

Extracting out the COPT from that data (i.e. removing the first four
bytes), I can parse using the COPT schema about 45700 times/second, or
6.565 seconds. Quite a bit slower, but COPT is more complex so not
totally unexpected.

The combined time is about 8.241 seconds.

Parsing the original data with the S7-full schema, I get about 42400
times/second, or 7.075 seconds. So it's a little slower than just the
COPT but faster than the combined time. Which I think makes sense. COPT
doesn't need to parse the TPKT header like S7-full does, so S7-full
should be slower. But S7-full also doesn't double count the userdata
parse time, which is what combining the TPKT+COPT times effectively
does, so it should be faster than the combined times.

My guess is that maybe your JVM just isn't warmed up enough? I think I
needed to get above 100,000 iterations before reaching the maximum parse
speed.

FYI, to get these numbers I used to daffodil performance subcommand in
the CLI, e.g.

  daffodil performance -N 300000 -s schemaPath testData.bin

- Steve

On 1/23/19 12:12 PM, Christofer Dutz wrote:
> Hi all,
> 
> ok so I solved this one myself.
> 
> During my search I stumbled over DFDL-1355 (https://opensource.ncsa.illinois.edu/jira/browse/DFDL-1355)
> Where they say: " DFDL spec says "The Root of the Branch MUST NOT be optional. That is XSDL minOccurs MUST BE greater than 0.""
> 
> So I thought: Oh well, then just let me create a complex type element with a sequence of one empty element and that seems to have solved this problem:
> 
> So now my type looks like this:
> 
>     <xs:element name="S7RequestPayloadReadVar">
>         <xs:complexType>
>             <xs:sequence>
>                 <xs:element name="payload" type="s7f:byte" minOccurs="0"/>
>             </xs:sequence>
>         </xs:complexType>
>     </xs:element>
> 
> And now it's working :-)
> 
> 
> Chris
> 
> 
> 
> Am 23.01.19, 17:25 schrieb "Christofer Dutz" <ch...@c-ware.de>:
> 
>     Hi Steve,
>     
>     Now I've created a merged version of my 3 schemas in order to see if the performance is better.
>     I did notice, that if I run tests that parsing usually takes about twice as long in the merged schema.
>     The tests are running for inputs targeting the first 2 levels and I know that now if parsing a level-2 input
>     Im parsing a TPKT packet with included COTP payload so I'm actually parsing two levels, however if 
>     I add the parsing time of TPKT and add that of simple COTP the sum is quite a bit lower than that of 
>     The combined schema. What could be causing this?
>     
>     And I ran into some problems again :/ ... in my S7 Schema I have the case where I need to output an (empty) payload element to match a parameter element.
>     
>     Unfortunately doing this:
>     
>         <xs:element name="S7RequestPayloadReadVar" type="xs:byte" dfdl:lengthKind="explicit" dfdl:length="0"/>
>     
>     Doesn't seem to work and I get the following error:
>     
>     Expression Evaluation Error: Element s7f:S7RequestPayloadReadVar does not have a value.
>     Schema context: element reference s7f:S7RequestPayloadReadVar Location line 423 column 46 in file:/Users/christofer.dutz/Projects/Apache/PLC4X/protocols/target/classes/org/apache/plc4x/protocols/s7-full-stack-protocol.dfdl.xsd
>     
>     How can I achieve this?
>     
>     Chris
>     
>     
>     
>     
>     Am 22.01.19, 23:02 schrieb "Steve Lawrence" <sl...@apache.org>:
>     
>         If merged schemas allow you to access other fields to calculate the
>         length of the userData field instead of using delimited hexBinary, I
>         suspect you would see a noticeable performance increase.
>         
>         Delimited hexBinary is implemented as encoding the input bytes into
>         ISO-8859-1 characters and building up a string until a delimiter or end
>         of data is found. The resulting string is then decoded to get the hex
>         binary byte array. It's not terribly slow, but is inefficient compared
>         to how we normally get hexBinary bytes with an explicit length. In the
>         explicit length case, we know exactly how many bits to read and can read
>         the source bytes directly into a hexBinary array, avoiding all the
>         encoding/decoding/delimiter scanning complexity.
>         
>         - Steve
>         
>         On 1/22/19 3:48 PM, Christofer Dutz wrote:
>         > Hi Steve
>         > 
>         > Yup ... couldn't wait till tomorrow and yes ... 
>         > your option worked (Wonder what I had different)
>         > 
>         > Performance-wise ... would it be better to join the schemas?
>         > 
>         > As I will always parse all 3 schemas and use them for serialization.
>         > I could imagine a merged schema (where I can for example get the 
>         > length for COTP from the KPKT and use that for the userData)
>         > 
>         > Chris
>         > 
>         > 
>         > Am 22.01.19, 18:44 schrieb "Steve Lawrence" <sl...@apache.org>:
>         > 
>         >     Yep, I think hexBinay with dfdl:lengthKind="delimited" should work for
>         >     your case. I've modified the userData element to look like this:
>         >     
>         >       <xs:element name="userData" type="xs:hexBinary"
>         >         dfdl:byteOrder="bigEndian" dfdl:lengthKind="delimited"
>         >         dfdl:encoding="ISO-8859-1" dfdl:textTrimKind="none" />
>         >     
>         >     This will cause the userData field to consume all data until the end of
>         >     the input. Note that delimited hexBinary is treated like string data, so
>         >     the encoding and textTrimKind properties need to be specified--it might
>         >     make sense to move them to the cotpFormat.
>         >     
>         >     I'm guessing the test you're talking about is "scenarioDataTpdu". With
>         >     the above change to the schema and using the data from that test:
>         >     
>         >       02F080320700000300000800080001120411440100ff09000401320004
>         >     
>         >     The resulting infoset is:
>         >     
>         >       <cotp:CoTpTPDU xmlns:cotp="http://plc4x.apache.org/cotp">
>         >         <headerLength>2</headerLength>
>         >         <type>240</type>
>         >         <cotp:CotpTpduData>
>         >           <endOfTransmission>1</endOfTransmission>
>         >           <tpduRef>0</tpduRef>
>         >         </cotp:CotpTpduData>
>         >     <userData>320700000300000800080001120411440100FF09000401320004</userData>
>         >       </cotp:CoTpTPDU>
>         >     
>         >     Three bytes total are consumed for the headerLength, type, and
>         >     CotPTpduData field, and the remaining bytes end up in the userData field
>         >     as hexBinary. If there is no remaining data in the input, then the
>         >     <userData> element is just empty (i.e. <userData />).
>         >     
>         >     - Steve
>         >     
>         >     
>         >     
>         >     On 1/22/19 11:58 AM, Christofer Dutz wrote:
>         >     > Hi Steve,
>         >     > 
>         >     > The code is in the plc4x repo I posted several times now. Unfortunately I'm 
>         >     > sitting in a train without my laptop. It's the COTP protocol. There's a matching 
>         >     > tdml test with commented out binary payload. That's what I'm trying to read.
>         >     > 
>         >     > Could probably post the links some time this evening.
>         >     > 
>         >     > Chris
>         >     > 
>         >     > Outlook für Android <https://aka.ms/ghei36> herunterladen
>         >     > 
>         >     > --------------------------------------------------------------------------------
>         >     > *From:* Steve Lawrence <sl...@apache.org>
>         >     > *Sent:* Tuesday, January 22, 2019 5:17:24 PM
>         >     > *To:* dev@daffodil.apache.org; Christofer Dutz
>         >     > *Subject:* Re: How to achieve lengthKind=”endOfParent” without using endOfParent?
>         >     > There isn't a concept of a global length of input since some inputs
>         >     > could be streaming and so we don't actually know the length until the
>         >     > end of data is reached.
>         >     > 
>         >     > I guess it isn't clear to me what your data looks like. I /think/
>         >     > delimited hexBinary should work. If the parent element does not have a
>         >     > length, delimited hex binary should consume all available data up until
>         >     > the end. Could you provide a little more detail on what your data looks
>         >     > like (e.g. what has a known lengths, headers, user data, etc.)
>         >     > 
>         >     > As far as implementing lengthKind="prefixed", I don't think the current
>         >     > Daffodil devs have the resources to implement endOfParent right now.
>         >     > Most of us are focused on other tasks at the moment. Tough, it's
>         >     > definitely possible to implement it--there aren't any real technical
>         >     > limitations that I know of with the current code base--but it probably
>         >     > would be a decent amount of work and would be an ambitious tasks for a
>         >     > first time Daffodil contributor. Such a feature touches a lot of
>         >     > different parts of Daffodil so there's a lot to learn. We're more than
>         >     > happy to provide guidance if you do want to contribute this feature, and
>         >     > it probably could be done in reasonably sized chunks, but I'd first want
>         >     > to confirm that there isn't an alternative.
>         >     > 
>         >     > - Steve
>         >     > 
>         >     > 
>         >     > On 1/22/19 10:35 AM, Christofer Dutz wrote:
>         >     >> Hi Steve,
>         >     >> 
>         >     >> well the problem is that I don't have the parent length in the current context.
>         >     >> 
>         >     >> Without it, it doesn't seem to work.
>         >     >> 
>         >     >> If there was some sort of global variable providing the total length of the entire input, that would be awesome.
>         >     >> As I mentioned, the length information in in the surrounding protocol, I wanted to model them all as separate as possible.
>         >     >> 
>         >     >> Would it be possible to implement lengthKind="endOfParent"? Would it be a lot of work? Could I help with it?
>         >     >> 
>         >     >> Chris
>         >     >> 
>         >     >> 
>         >     >> 
>         >     >> Am 22.01.19, 15:48 schrieb "Steve Lawrence" <sl...@apache.org>:
>         >     >> 
>         >     >>     Correct, lengthKind="endOfParent" has not bee implemented yet.
>         >     >>     
>         >     >>     As an alternative that we do support, you should be able to use
>         >     >>     dfdl:lengthKind="delimited" for the hexBinary user data. In this case,
>         >     >>     there's no delimiter, but parent length sort of acts like one. For example:
>         >     >>     
>         >     >>       <xs:element name="Parent"
>         >     >>         dfdl:lengthKind="explicit" dfdl:length="4"
>         >     >>         dfdl:lengthUnits="bytes">
>         >     >>         <xs:complexType>
>         >     >>           <xs:sequence>
>         >     >>             <xs:element name="Header" type="xs:hexBinary"
>         >     >>               dfdl:lengthKind="explicit" dfdl:length="1"
>         >     >>               dfdl:lengthUnits="bytes" />
>         >     >>             <xs:element name="UserData" type="xs:hexBinary"
>         >     >>               dfdl:lengthKind="delimited" dfdl:encoding="ISO-8859-1"/>
>         >     >>           </xs:sequence>
>         >     >>         </xs:complexType>
>         >     >>       </xs:element>
>         >     >>     
>         >     >>     So the parent element is 4 bytes and the header is 1 byte. If we parse
>         >     >>     the data:
>         >     >>     
>         >     >>       0xAA BB CC DD
>         >     >>     
>         >     >>     We get the following infoset
>         >     >>     
>         >     >>       <Parent>
>         >     >>         <Header>AA</Header>
>         >     >>         <UserData>BBCCDD</UserData>
>         >     >>       </Parent>
>         >     >>     
>         >     >>     And the UserData is the remaining three bytes. Using
>         >     >>     lengthKind="endOfParent" would probably have better performance if we
>         >     >>     implemented it, but this should give the same result for the hexBinary
>         >     >>     blob at the end.
>         >     >>     
>         >     >>     - Steve
>         >     >>     
>         >     >>     
>         >     >>     On 1/22/19 4:16 AM, Christofer Dutz wrote:
>         >     >>     > Hi all,
>         >     >>     > 
>         >     >>     > I am stuck with a little problem … I am reading a packet, which is usually contained inside another. Therefore it doesn’t provide any means of providing it’s length.
>         >     >>     > So the packet is just a small header + binary data … now I want to read “all the rest” after the header into a field “userData”.
>         >     >>     > In the DFDL documentation at IBM I could read that the lengthKind=”endOfParent” would be what I’m looking for.
>         >     >>     > 
>         >     >>     > Unfortunately this doesn’t seem to be supported … so how can I achieve the same with implemented options?
>         >     >>     > 
>         >     >>     > Chris
>         >     >>     > 
>         >     >>     
>         >     >>     
>         >     >> 
>         >     > 
>         >     
>         >     
>         > 
>         
>         
>     
>     
> 


Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi all,

ok so I solved this one myself.

During my search I stumbled over DFDL-1355 (https://opensource.ncsa.illinois.edu/jira/browse/DFDL-1355)
Where they say: " DFDL spec says "The Root of the Branch MUST NOT be optional. That is XSDL minOccurs MUST BE greater than 0.""

So I thought: Oh well, then just let me create a complex type element with a sequence of one empty element and that seems to have solved this problem:

So now my type looks like this:

    <xs:element name="S7RequestPayloadReadVar">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="payload" type="s7f:byte" minOccurs="0"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

And now it's working :-)


Chris



Am 23.01.19, 17:25 schrieb "Christofer Dutz" <ch...@c-ware.de>:

    Hi Steve,
    
    Now I've created a merged version of my 3 schemas in order to see if the performance is better.
    I did notice, that if I run tests that parsing usually takes about twice as long in the merged schema.
    The tests are running for inputs targeting the first 2 levels and I know that now if parsing a level-2 input
    Im parsing a TPKT packet with included COTP payload so I'm actually parsing two levels, however if 
    I add the parsing time of TPKT and add that of simple COTP the sum is quite a bit lower than that of 
    The combined schema. What could be causing this?
    
    And I ran into some problems again :/ ... in my S7 Schema I have the case where I need to output an (empty) payload element to match a parameter element.
    
    Unfortunately doing this:
    
        <xs:element name="S7RequestPayloadReadVar" type="xs:byte" dfdl:lengthKind="explicit" dfdl:length="0"/>
    
    Doesn't seem to work and I get the following error:
    
    Expression Evaluation Error: Element s7f:S7RequestPayloadReadVar does not have a value.
    Schema context: element reference s7f:S7RequestPayloadReadVar Location line 423 column 46 in file:/Users/christofer.dutz/Projects/Apache/PLC4X/protocols/target/classes/org/apache/plc4x/protocols/s7-full-stack-protocol.dfdl.xsd
    
    How can I achieve this?
    
    Chris
    
    
    
    
    Am 22.01.19, 23:02 schrieb "Steve Lawrence" <sl...@apache.org>:
    
        If merged schemas allow you to access other fields to calculate the
        length of the userData field instead of using delimited hexBinary, I
        suspect you would see a noticeable performance increase.
        
        Delimited hexBinary is implemented as encoding the input bytes into
        ISO-8859-1 characters and building up a string until a delimiter or end
        of data is found. The resulting string is then decoded to get the hex
        binary byte array. It's not terribly slow, but is inefficient compared
        to how we normally get hexBinary bytes with an explicit length. In the
        explicit length case, we know exactly how many bits to read and can read
        the source bytes directly into a hexBinary array, avoiding all the
        encoding/decoding/delimiter scanning complexity.
        
        - Steve
        
        On 1/22/19 3:48 PM, Christofer Dutz wrote:
        > Hi Steve
        > 
        > Yup ... couldn't wait till tomorrow and yes ... 
        > your option worked (Wonder what I had different)
        > 
        > Performance-wise ... would it be better to join the schemas?
        > 
        > As I will always parse all 3 schemas and use them for serialization.
        > I could imagine a merged schema (where I can for example get the 
        > length for COTP from the KPKT and use that for the userData)
        > 
        > Chris
        > 
        > 
        > Am 22.01.19, 18:44 schrieb "Steve Lawrence" <sl...@apache.org>:
        > 
        >     Yep, I think hexBinay with dfdl:lengthKind="delimited" should work for
        >     your case. I've modified the userData element to look like this:
        >     
        >       <xs:element name="userData" type="xs:hexBinary"
        >         dfdl:byteOrder="bigEndian" dfdl:lengthKind="delimited"
        >         dfdl:encoding="ISO-8859-1" dfdl:textTrimKind="none" />
        >     
        >     This will cause the userData field to consume all data until the end of
        >     the input. Note that delimited hexBinary is treated like string data, so
        >     the encoding and textTrimKind properties need to be specified--it might
        >     make sense to move them to the cotpFormat.
        >     
        >     I'm guessing the test you're talking about is "scenarioDataTpdu". With
        >     the above change to the schema and using the data from that test:
        >     
        >       02F080320700000300000800080001120411440100ff09000401320004
        >     
        >     The resulting infoset is:
        >     
        >       <cotp:CoTpTPDU xmlns:cotp="http://plc4x.apache.org/cotp">
        >         <headerLength>2</headerLength>
        >         <type>240</type>
        >         <cotp:CotpTpduData>
        >           <endOfTransmission>1</endOfTransmission>
        >           <tpduRef>0</tpduRef>
        >         </cotp:CotpTpduData>
        >     <userData>320700000300000800080001120411440100FF09000401320004</userData>
        >       </cotp:CoTpTPDU>
        >     
        >     Three bytes total are consumed for the headerLength, type, and
        >     CotPTpduData field, and the remaining bytes end up in the userData field
        >     as hexBinary. If there is no remaining data in the input, then the
        >     <userData> element is just empty (i.e. <userData />).
        >     
        >     - Steve
        >     
        >     
        >     
        >     On 1/22/19 11:58 AM, Christofer Dutz wrote:
        >     > Hi Steve,
        >     > 
        >     > The code is in the plc4x repo I posted several times now. Unfortunately I'm 
        >     > sitting in a train without my laptop. It's the COTP protocol. There's a matching 
        >     > tdml test with commented out binary payload. That's what I'm trying to read.
        >     > 
        >     > Could probably post the links some time this evening.
        >     > 
        >     > Chris
        >     > 
        >     > Outlook für Android <https://aka.ms/ghei36> herunterladen
        >     > 
        >     > --------------------------------------------------------------------------------
        >     > *From:* Steve Lawrence <sl...@apache.org>
        >     > *Sent:* Tuesday, January 22, 2019 5:17:24 PM
        >     > *To:* dev@daffodil.apache.org; Christofer Dutz
        >     > *Subject:* Re: How to achieve lengthKind=”endOfParent” without using endOfParent?
        >     > There isn't a concept of a global length of input since some inputs
        >     > could be streaming and so we don't actually know the length until the
        >     > end of data is reached.
        >     > 
        >     > I guess it isn't clear to me what your data looks like. I /think/
        >     > delimited hexBinary should work. If the parent element does not have a
        >     > length, delimited hex binary should consume all available data up until
        >     > the end. Could you provide a little more detail on what your data looks
        >     > like (e.g. what has a known lengths, headers, user data, etc.)
        >     > 
        >     > As far as implementing lengthKind="prefixed", I don't think the current
        >     > Daffodil devs have the resources to implement endOfParent right now.
        >     > Most of us are focused on other tasks at the moment. Tough, it's
        >     > definitely possible to implement it--there aren't any real technical
        >     > limitations that I know of with the current code base--but it probably
        >     > would be a decent amount of work and would be an ambitious tasks for a
        >     > first time Daffodil contributor. Such a feature touches a lot of
        >     > different parts of Daffodil so there's a lot to learn. We're more than
        >     > happy to provide guidance if you do want to contribute this feature, and
        >     > it probably could be done in reasonably sized chunks, but I'd first want
        >     > to confirm that there isn't an alternative.
        >     > 
        >     > - Steve
        >     > 
        >     > 
        >     > On 1/22/19 10:35 AM, Christofer Dutz wrote:
        >     >> Hi Steve,
        >     >> 
        >     >> well the problem is that I don't have the parent length in the current context.
        >     >> 
        >     >> Without it, it doesn't seem to work.
        >     >> 
        >     >> If there was some sort of global variable providing the total length of the entire input, that would be awesome.
        >     >> As I mentioned, the length information in in the surrounding protocol, I wanted to model them all as separate as possible.
        >     >> 
        >     >> Would it be possible to implement lengthKind="endOfParent"? Would it be a lot of work? Could I help with it?
        >     >> 
        >     >> Chris
        >     >> 
        >     >> 
        >     >> 
        >     >> Am 22.01.19, 15:48 schrieb "Steve Lawrence" <sl...@apache.org>:
        >     >> 
        >     >>     Correct, lengthKind="endOfParent" has not bee implemented yet.
        >     >>     
        >     >>     As an alternative that we do support, you should be able to use
        >     >>     dfdl:lengthKind="delimited" for the hexBinary user data. In this case,
        >     >>     there's no delimiter, but parent length sort of acts like one. For example:
        >     >>     
        >     >>       <xs:element name="Parent"
        >     >>         dfdl:lengthKind="explicit" dfdl:length="4"
        >     >>         dfdl:lengthUnits="bytes">
        >     >>         <xs:complexType>
        >     >>           <xs:sequence>
        >     >>             <xs:element name="Header" type="xs:hexBinary"
        >     >>               dfdl:lengthKind="explicit" dfdl:length="1"
        >     >>               dfdl:lengthUnits="bytes" />
        >     >>             <xs:element name="UserData" type="xs:hexBinary"
        >     >>               dfdl:lengthKind="delimited" dfdl:encoding="ISO-8859-1"/>
        >     >>           </xs:sequence>
        >     >>         </xs:complexType>
        >     >>       </xs:element>
        >     >>     
        >     >>     So the parent element is 4 bytes and the header is 1 byte. If we parse
        >     >>     the data:
        >     >>     
        >     >>       0xAA BB CC DD
        >     >>     
        >     >>     We get the following infoset
        >     >>     
        >     >>       <Parent>
        >     >>         <Header>AA</Header>
        >     >>         <UserData>BBCCDD</UserData>
        >     >>       </Parent>
        >     >>     
        >     >>     And the UserData is the remaining three bytes. Using
        >     >>     lengthKind="endOfParent" would probably have better performance if we
        >     >>     implemented it, but this should give the same result for the hexBinary
        >     >>     blob at the end.
        >     >>     
        >     >>     - Steve
        >     >>     
        >     >>     
        >     >>     On 1/22/19 4:16 AM, Christofer Dutz wrote:
        >     >>     > Hi all,
        >     >>     > 
        >     >>     > I am stuck with a little problem … I am reading a packet, which is usually contained inside another. Therefore it doesn’t provide any means of providing it’s length.
        >     >>     > So the packet is just a small header + binary data … now I want to read “all the rest” after the header into a field “userData”.
        >     >>     > In the DFDL documentation at IBM I could read that the lengthKind=”endOfParent” would be what I’m looking for.
        >     >>     > 
        >     >>     > Unfortunately this doesn’t seem to be supported … so how can I achieve the same with implemented options?
        >     >>     > 
        >     >>     > Chris
        >     >>     > 
        >     >>     
        >     >>     
        >     >> 
        >     > 
        >     
        >     
        > 
        
        
    
    


Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi Steve,

Now I've created a merged version of my 3 schemas in order to see if the performance is better.
I did notice, that if I run tests that parsing usually takes about twice as long in the merged schema.
The tests are running for inputs targeting the first 2 levels and I know that now if parsing a level-2 input
Im parsing a TPKT packet with included COTP payload so I'm actually parsing two levels, however if 
I add the parsing time of TPKT and add that of simple COTP the sum is quite a bit lower than that of 
The combined schema. What could be causing this?

And I ran into some problems again :/ ... in my S7 Schema I have the case where I need to output an (empty) payload element to match a parameter element.

Unfortunately doing this:

    <xs:element name="S7RequestPayloadReadVar" type="xs:byte" dfdl:lengthKind="explicit" dfdl:length="0"/>

Doesn't seem to work and I get the following error:

Expression Evaluation Error: Element s7f:S7RequestPayloadReadVar does not have a value.
Schema context: element reference s7f:S7RequestPayloadReadVar Location line 423 column 46 in file:/Users/christofer.dutz/Projects/Apache/PLC4X/protocols/target/classes/org/apache/plc4x/protocols/s7-full-stack-protocol.dfdl.xsd

How can I achieve this?

Chris




Am 22.01.19, 23:02 schrieb "Steve Lawrence" <sl...@apache.org>:

    If merged schemas allow you to access other fields to calculate the
    length of the userData field instead of using delimited hexBinary, I
    suspect you would see a noticeable performance increase.
    
    Delimited hexBinary is implemented as encoding the input bytes into
    ISO-8859-1 characters and building up a string until a delimiter or end
    of data is found. The resulting string is then decoded to get the hex
    binary byte array. It's not terribly slow, but is inefficient compared
    to how we normally get hexBinary bytes with an explicit length. In the
    explicit length case, we know exactly how many bits to read and can read
    the source bytes directly into a hexBinary array, avoiding all the
    encoding/decoding/delimiter scanning complexity.
    
    - Steve
    
    On 1/22/19 3:48 PM, Christofer Dutz wrote:
    > Hi Steve
    > 
    > Yup ... couldn't wait till tomorrow and yes ... 
    > your option worked (Wonder what I had different)
    > 
    > Performance-wise ... would it be better to join the schemas?
    > 
    > As I will always parse all 3 schemas and use them for serialization.
    > I could imagine a merged schema (where I can for example get the 
    > length for COTP from the KPKT and use that for the userData)
    > 
    > Chris
    > 
    > 
    > Am 22.01.19, 18:44 schrieb "Steve Lawrence" <sl...@apache.org>:
    > 
    >     Yep, I think hexBinay with dfdl:lengthKind="delimited" should work for
    >     your case. I've modified the userData element to look like this:
    >     
    >       <xs:element name="userData" type="xs:hexBinary"
    >         dfdl:byteOrder="bigEndian" dfdl:lengthKind="delimited"
    >         dfdl:encoding="ISO-8859-1" dfdl:textTrimKind="none" />
    >     
    >     This will cause the userData field to consume all data until the end of
    >     the input. Note that delimited hexBinary is treated like string data, so
    >     the encoding and textTrimKind properties need to be specified--it might
    >     make sense to move them to the cotpFormat.
    >     
    >     I'm guessing the test you're talking about is "scenarioDataTpdu". With
    >     the above change to the schema and using the data from that test:
    >     
    >       02F080320700000300000800080001120411440100ff09000401320004
    >     
    >     The resulting infoset is:
    >     
    >       <cotp:CoTpTPDU xmlns:cotp="http://plc4x.apache.org/cotp">
    >         <headerLength>2</headerLength>
    >         <type>240</type>
    >         <cotp:CotpTpduData>
    >           <endOfTransmission>1</endOfTransmission>
    >           <tpduRef>0</tpduRef>
    >         </cotp:CotpTpduData>
    >     <userData>320700000300000800080001120411440100FF09000401320004</userData>
    >       </cotp:CoTpTPDU>
    >     
    >     Three bytes total are consumed for the headerLength, type, and
    >     CotPTpduData field, and the remaining bytes end up in the userData field
    >     as hexBinary. If there is no remaining data in the input, then the
    >     <userData> element is just empty (i.e. <userData />).
    >     
    >     - Steve
    >     
    >     
    >     
    >     On 1/22/19 11:58 AM, Christofer Dutz wrote:
    >     > Hi Steve,
    >     > 
    >     > The code is in the plc4x repo I posted several times now. Unfortunately I'm 
    >     > sitting in a train without my laptop. It's the COTP protocol. There's a matching 
    >     > tdml test with commented out binary payload. That's what I'm trying to read.
    >     > 
    >     > Could probably post the links some time this evening.
    >     > 
    >     > Chris
    >     > 
    >     > Outlook für Android <https://aka.ms/ghei36> herunterladen
    >     > 
    >     > --------------------------------------------------------------------------------
    >     > *From:* Steve Lawrence <sl...@apache.org>
    >     > *Sent:* Tuesday, January 22, 2019 5:17:24 PM
    >     > *To:* dev@daffodil.apache.org; Christofer Dutz
    >     > *Subject:* Re: How to achieve lengthKind=”endOfParent” without using endOfParent?
    >     > There isn't a concept of a global length of input since some inputs
    >     > could be streaming and so we don't actually know the length until the
    >     > end of data is reached.
    >     > 
    >     > I guess it isn't clear to me what your data looks like. I /think/
    >     > delimited hexBinary should work. If the parent element does not have a
    >     > length, delimited hex binary should consume all available data up until
    >     > the end. Could you provide a little more detail on what your data looks
    >     > like (e.g. what has a known lengths, headers, user data, etc.)
    >     > 
    >     > As far as implementing lengthKind="prefixed", I don't think the current
    >     > Daffodil devs have the resources to implement endOfParent right now.
    >     > Most of us are focused on other tasks at the moment. Tough, it's
    >     > definitely possible to implement it--there aren't any real technical
    >     > limitations that I know of with the current code base--but it probably
    >     > would be a decent amount of work and would be an ambitious tasks for a
    >     > first time Daffodil contributor. Such a feature touches a lot of
    >     > different parts of Daffodil so there's a lot to learn. We're more than
    >     > happy to provide guidance if you do want to contribute this feature, and
    >     > it probably could be done in reasonably sized chunks, but I'd first want
    >     > to confirm that there isn't an alternative.
    >     > 
    >     > - Steve
    >     > 
    >     > 
    >     > On 1/22/19 10:35 AM, Christofer Dutz wrote:
    >     >> Hi Steve,
    >     >> 
    >     >> well the problem is that I don't have the parent length in the current context.
    >     >> 
    >     >> Without it, it doesn't seem to work.
    >     >> 
    >     >> If there was some sort of global variable providing the total length of the entire input, that would be awesome.
    >     >> As I mentioned, the length information in in the surrounding protocol, I wanted to model them all as separate as possible.
    >     >> 
    >     >> Would it be possible to implement lengthKind="endOfParent"? Would it be a lot of work? Could I help with it?
    >     >> 
    >     >> Chris
    >     >> 
    >     >> 
    >     >> 
    >     >> Am 22.01.19, 15:48 schrieb "Steve Lawrence" <sl...@apache.org>:
    >     >> 
    >     >>     Correct, lengthKind="endOfParent" has not bee implemented yet.
    >     >>     
    >     >>     As an alternative that we do support, you should be able to use
    >     >>     dfdl:lengthKind="delimited" for the hexBinary user data. In this case,
    >     >>     there's no delimiter, but parent length sort of acts like one. For example:
    >     >>     
    >     >>       <xs:element name="Parent"
    >     >>         dfdl:lengthKind="explicit" dfdl:length="4"
    >     >>         dfdl:lengthUnits="bytes">
    >     >>         <xs:complexType>
    >     >>           <xs:sequence>
    >     >>             <xs:element name="Header" type="xs:hexBinary"
    >     >>               dfdl:lengthKind="explicit" dfdl:length="1"
    >     >>               dfdl:lengthUnits="bytes" />
    >     >>             <xs:element name="UserData" type="xs:hexBinary"
    >     >>               dfdl:lengthKind="delimited" dfdl:encoding="ISO-8859-1"/>
    >     >>           </xs:sequence>
    >     >>         </xs:complexType>
    >     >>       </xs:element>
    >     >>     
    >     >>     So the parent element is 4 bytes and the header is 1 byte. If we parse
    >     >>     the data:
    >     >>     
    >     >>       0xAA BB CC DD
    >     >>     
    >     >>     We get the following infoset
    >     >>     
    >     >>       <Parent>
    >     >>         <Header>AA</Header>
    >     >>         <UserData>BBCCDD</UserData>
    >     >>       </Parent>
    >     >>     
    >     >>     And the UserData is the remaining three bytes. Using
    >     >>     lengthKind="endOfParent" would probably have better performance if we
    >     >>     implemented it, but this should give the same result for the hexBinary
    >     >>     blob at the end.
    >     >>     
    >     >>     - Steve
    >     >>     
    >     >>     
    >     >>     On 1/22/19 4:16 AM, Christofer Dutz wrote:
    >     >>     > Hi all,
    >     >>     > 
    >     >>     > I am stuck with a little problem … I am reading a packet, which is usually contained inside another. Therefore it doesn’t provide any means of providing it’s length.
    >     >>     > So the packet is just a small header + binary data … now I want to read “all the rest” after the header into a field “userData”.
    >     >>     > In the DFDL documentation at IBM I could read that the lengthKind=”endOfParent” would be what I’m looking for.
    >     >>     > 
    >     >>     > Unfortunately this doesn’t seem to be supported … so how can I achieve the same with implemented options?
    >     >>     > 
    >     >>     > Chris
    >     >>     > 
    >     >>     
    >     >>     
    >     >> 
    >     > 
    >     
    >     
    > 
    
    


Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Posted by Steve Lawrence <sl...@apache.org>.
Yep, I think hexBinay with dfdl:lengthKind="delimited" should work for
your case. I've modified the userData element to look like this:

  <xs:element name="userData" type="xs:hexBinary"
    dfdl:byteOrder="bigEndian" dfdl:lengthKind="delimited"
    dfdl:encoding="ISO-8859-1" dfdl:textTrimKind="none" />

This will cause the userData field to consume all data until the end of
the input. Note that delimited hexBinary is treated like string data, so
the encoding and textTrimKind properties need to be specified--it might
make sense to move them to the cotpFormat.

I'm guessing the test you're talking about is "scenarioDataTpdu". With
the above change to the schema and using the data from that test:

  02F080320700000300000800080001120411440100ff09000401320004

The resulting infoset is:

  <cotp:CoTpTPDU xmlns:cotp="http://plc4x.apache.org/cotp">
    <headerLength>2</headerLength>
    <type>240</type>
    <cotp:CotpTpduData>
      <endOfTransmission>1</endOfTransmission>
      <tpduRef>0</tpduRef>
    </cotp:CotpTpduData>
<userData>320700000300000800080001120411440100FF09000401320004</userData>
  </cotp:CoTpTPDU>

Three bytes total are consumed for the headerLength, type, and
CotPTpduData field, and the remaining bytes end up in the userData field
as hexBinary. If there is no remaining data in the input, then the
<userData> element is just empty (i.e. <userData />).

- Steve



On 1/22/19 11:58 AM, Christofer Dutz wrote:
> Hi Steve,
> 
> The code is in the plc4x repo I posted several times now. Unfortunately I'm 
> sitting in a train without my laptop. It's the COTP protocol. There's a matching 
> tdml test with commented out binary payload. That's what I'm trying to read.
> 
> Could probably post the links some time this evening.
> 
> Chris
> 
> Outlook für Android <https://aka.ms/ghei36> herunterladen
> 
> --------------------------------------------------------------------------------
> *From:* Steve Lawrence <sl...@apache.org>
> *Sent:* Tuesday, January 22, 2019 5:17:24 PM
> *To:* dev@daffodil.apache.org; Christofer Dutz
> *Subject:* Re: How to achieve lengthKind=”endOfParent” without using endOfParent?
> There isn't a concept of a global length of input since some inputs
> could be streaming and so we don't actually know the length until the
> end of data is reached.
> 
> I guess it isn't clear to me what your data looks like. I /think/
> delimited hexBinary should work. If the parent element does not have a
> length, delimited hex binary should consume all available data up until
> the end. Could you provide a little more detail on what your data looks
> like (e.g. what has a known lengths, headers, user data, etc.)
> 
> As far as implementing lengthKind="prefixed", I don't think the current
> Daffodil devs have the resources to implement endOfParent right now.
> Most of us are focused on other tasks at the moment. Tough, it's
> definitely possible to implement it--there aren't any real technical
> limitations that I know of with the current code base--but it probably
> would be a decent amount of work and would be an ambitious tasks for a
> first time Daffodil contributor. Such a feature touches a lot of
> different parts of Daffodil so there's a lot to learn. We're more than
> happy to provide guidance if you do want to contribute this feature, and
> it probably could be done in reasonably sized chunks, but I'd first want
> to confirm that there isn't an alternative.
> 
> - Steve
> 
> 
> On 1/22/19 10:35 AM, Christofer Dutz wrote:
>> Hi Steve,
>> 
>> well the problem is that I don't have the parent length in the current context.
>> 
>> Without it, it doesn't seem to work.
>> 
>> If there was some sort of global variable providing the total length of the entire input, that would be awesome.
>> As I mentioned, the length information in in the surrounding protocol, I wanted to model them all as separate as possible.
>> 
>> Would it be possible to implement lengthKind="endOfParent"? Would it be a lot of work? Could I help with it?
>> 
>> Chris
>> 
>> 
>> 
>> Am 22.01.19, 15:48 schrieb "Steve Lawrence" <sl...@apache.org>:
>> 
>>     Correct, lengthKind="endOfParent" has not bee implemented yet.
>>     
>>     As an alternative that we do support, you should be able to use
>>     dfdl:lengthKind="delimited" for the hexBinary user data. In this case,
>>     there's no delimiter, but parent length sort of acts like one. For example:
>>     
>>       <xs:element name="Parent"
>>         dfdl:lengthKind="explicit" dfdl:length="4"
>>         dfdl:lengthUnits="bytes">
>>         <xs:complexType>
>>           <xs:sequence>
>>             <xs:element name="Header" type="xs:hexBinary"
>>               dfdl:lengthKind="explicit" dfdl:length="1"
>>               dfdl:lengthUnits="bytes" />
>>             <xs:element name="UserData" type="xs:hexBinary"
>>               dfdl:lengthKind="delimited" dfdl:encoding="ISO-8859-1"/>
>>           </xs:sequence>
>>         </xs:complexType>
>>       </xs:element>
>>     
>>     So the parent element is 4 bytes and the header is 1 byte. If we parse
>>     the data:
>>     
>>       0xAA BB CC DD
>>     
>>     We get the following infoset
>>     
>>       <Parent>
>>         <Header>AA</Header>
>>         <UserData>BBCCDD</UserData>
>>       </Parent>
>>     
>>     And the UserData is the remaining three bytes. Using
>>     lengthKind="endOfParent" would probably have better performance if we
>>     implemented it, but this should give the same result for the hexBinary
>>     blob at the end.
>>     
>>     - Steve
>>     
>>     
>>     On 1/22/19 4:16 AM, Christofer Dutz wrote:
>>     > Hi all,
>>     > 
>>     > I am stuck with a little problem … I am reading a packet, which is usually contained inside another. Therefore it doesn’t provide any means of providing it’s length.
>>     > So the packet is just a small header + binary data … now I want to read “all the rest” after the header into a field “userData”.
>>     > In the DFDL documentation at IBM I could read that the lengthKind=”endOfParent” would be what I’m looking for.
>>     > 
>>     > Unfortunately this doesn’t seem to be supported … so how can I achieve the same with implemented options?
>>     > 
>>     > Chris
>>     > 
>>     
>>     
>> 
> 


Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi Steve,

The code is in the plc4x repo I posted several times now. Unfortunately I'm sitting in a train without my laptop. It's the COTP protocol. There's a matching tdml test with commented out binary payload. That's what I'm trying to read.

Could probably post the links some time this evening.

Chris

Outlook für Android<https://aka.ms/ghei36> herunterladen

________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Tuesday, January 22, 2019 5:17:24 PM
To: dev@daffodil.apache.org; Christofer Dutz
Subject: Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

There isn't a concept of a global length of input since some inputs
could be streaming and so we don't actually know the length until the
end of data is reached.

I guess it isn't clear to me what your data looks like. I /think/
delimited hexBinary should work. If the parent element does not have a
length, delimited hex binary should consume all available data up until
the end. Could you provide a little more detail on what your data looks
like (e.g. what has a known lengths, headers, user data, etc.)

As far as implementing lengthKind="prefixed", I don't think the current
Daffodil devs have the resources to implement endOfParent right now.
Most of us are focused on other tasks at the moment. Tough, it's
definitely possible to implement it--there aren't any real technical
limitations that I know of with the current code base--but it probably
would be a decent amount of work and would be an ambitious tasks for a
first time Daffodil contributor. Such a feature touches a lot of
different parts of Daffodil so there's a lot to learn. We're more than
happy to provide guidance if you do want to contribute this feature, and
it probably could be done in reasonably sized chunks, but I'd first want
to confirm that there isn't an alternative.

- Steve


On 1/22/19 10:35 AM, Christofer Dutz wrote:
> Hi Steve,
>
> well the problem is that I don't have the parent length in the current context.
>
> Without it, it doesn't seem to work.
>
> If there was some sort of global variable providing the total length of the entire input, that would be awesome.
> As I mentioned, the length information in in the surrounding protocol, I wanted to model them all as separate as possible.
>
> Would it be possible to implement lengthKind="endOfParent"? Would it be a lot of work? Could I help with it?
>
> Chris
>
>
>
> Am 22.01.19, 15:48 schrieb "Steve Lawrence" <sl...@apache.org>:
>
>     Correct, lengthKind="endOfParent" has not bee implemented yet.
>
>     As an alternative that we do support, you should be able to use
>     dfdl:lengthKind="delimited" for the hexBinary user data. In this case,
>     there's no delimiter, but parent length sort of acts like one. For example:
>
>       <xs:element name="Parent"
>         dfdl:lengthKind="explicit" dfdl:length="4"
>         dfdl:lengthUnits="bytes">
>         <xs:complexType>
>           <xs:sequence>
>             <xs:element name="Header" type="xs:hexBinary"
>               dfdl:lengthKind="explicit" dfdl:length="1"
>               dfdl:lengthUnits="bytes" />
>             <xs:element name="UserData" type="xs:hexBinary"
>               dfdl:lengthKind="delimited" dfdl:encoding="ISO-8859-1"/>
>           </xs:sequence>
>         </xs:complexType>
>       </xs:element>
>
>     So the parent element is 4 bytes and the header is 1 byte. If we parse
>     the data:
>
>       0xAA BB CC DD
>
>     We get the following infoset
>
>       <Parent>
>         <Header>AA</Header>
>         <UserData>BBCCDD</UserData>
>       </Parent>
>
>     And the UserData is the remaining three bytes. Using
>     lengthKind="endOfParent" would probably have better performance if we
>     implemented it, but this should give the same result for the hexBinary
>     blob at the end.
>
>     - Steve
>
>
>     On 1/22/19 4:16 AM, Christofer Dutz wrote:
>     > Hi all,
>     >
>     > I am stuck with a little problem … I am reading a packet, which is usually contained inside another. Therefore it doesn’t provide any means of providing it’s length.
>     > So the packet is just a small header + binary data … now I want to read “all the rest” after the header into a field “userData”.
>     > In the DFDL documentation at IBM I could read that the lengthKind=”endOfParent” would be what I’m looking for.
>     >
>     > Unfortunately this doesn’t seem to be supported … so how can I achieve the same with implemented options?
>     >
>     > Chris
>     >
>
>
>


Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Posted by Steve Lawrence <sl...@apache.org>.
There isn't a concept of a global length of input since some inputs
could be streaming and so we don't actually know the length until the
end of data is reached.

I guess it isn't clear to me what your data looks like. I /think/
delimited hexBinary should work. If the parent element does not have a
length, delimited hex binary should consume all available data up until
the end. Could you provide a little more detail on what your data looks
like (e.g. what has a known lengths, headers, user data, etc.)

As far as implementing lengthKind="prefixed", I don't think the current
Daffodil devs have the resources to implement endOfParent right now.
Most of us are focused on other tasks at the moment. Tough, it's
definitely possible to implement it--there aren't any real technical
limitations that I know of with the current code base--but it probably
would be a decent amount of work and would be an ambitious tasks for a
first time Daffodil contributor. Such a feature touches a lot of
different parts of Daffodil so there's a lot to learn. We're more than
happy to provide guidance if you do want to contribute this feature, and
it probably could be done in reasonably sized chunks, but I'd first want
to confirm that there isn't an alternative.

- Steve


On 1/22/19 10:35 AM, Christofer Dutz wrote:
> Hi Steve,
> 
> well the problem is that I don't have the parent length in the current context.
> 
> Without it, it doesn't seem to work.
> 
> If there was some sort of global variable providing the total length of the entire input, that would be awesome.
> As I mentioned, the length information in in the surrounding protocol, I wanted to model them all as separate as possible.
> 
> Would it be possible to implement lengthKind="endOfParent"? Would it be a lot of work? Could I help with it?
> 
> Chris
> 
> 
> 
> Am 22.01.19, 15:48 schrieb "Steve Lawrence" <sl...@apache.org>:
> 
>     Correct, lengthKind="endOfParent" has not bee implemented yet.
>     
>     As an alternative that we do support, you should be able to use
>     dfdl:lengthKind="delimited" for the hexBinary user data. In this case,
>     there's no delimiter, but parent length sort of acts like one. For example:
>     
>       <xs:element name="Parent"
>         dfdl:lengthKind="explicit" dfdl:length="4"
>         dfdl:lengthUnits="bytes">
>         <xs:complexType>
>           <xs:sequence>
>             <xs:element name="Header" type="xs:hexBinary"
>               dfdl:lengthKind="explicit" dfdl:length="1"
>               dfdl:lengthUnits="bytes" />
>             <xs:element name="UserData" type="xs:hexBinary"
>               dfdl:lengthKind="delimited" dfdl:encoding="ISO-8859-1"/>
>           </xs:sequence>
>         </xs:complexType>
>       </xs:element>
>     
>     So the parent element is 4 bytes and the header is 1 byte. If we parse
>     the data:
>     
>       0xAA BB CC DD
>     
>     We get the following infoset
>     
>       <Parent>
>         <Header>AA</Header>
>         <UserData>BBCCDD</UserData>
>       </Parent>
>     
>     And the UserData is the remaining three bytes. Using
>     lengthKind="endOfParent" would probably have better performance if we
>     implemented it, but this should give the same result for the hexBinary
>     blob at the end.
>     
>     - Steve
>     
>     
>     On 1/22/19 4:16 AM, Christofer Dutz wrote:
>     > Hi all,
>     > 
>     > I am stuck with a little problem … I am reading a packet, which is usually contained inside another. Therefore it doesn’t provide any means of providing it’s length.
>     > So the packet is just a small header + binary data … now I want to read “all the rest” after the header into a field “userData”.
>     > In the DFDL documentation at IBM I could read that the lengthKind=”endOfParent” would be what I’m looking for.
>     > 
>     > Unfortunately this doesn’t seem to be supported … so how can I achieve the same with implemented options?
>     > 
>     > Chris
>     > 
>     
>     
> 


Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Posted by Christofer Dutz <ch...@c-ware.de>.
Hi Steve,

well the problem is that I don't have the parent length in the current context.

Without it, it doesn't seem to work.

If there was some sort of global variable providing the total length of the entire input, that would be awesome.
As I mentioned, the length information in in the surrounding protocol, I wanted to model them all as separate as possible.

Would it be possible to implement lengthKind="endOfParent"? Would it be a lot of work? Could I help with it?

Chris



Am 22.01.19, 15:48 schrieb "Steve Lawrence" <sl...@apache.org>:

    Correct, lengthKind="endOfParent" has not bee implemented yet.
    
    As an alternative that we do support, you should be able to use
    dfdl:lengthKind="delimited" for the hexBinary user data. In this case,
    there's no delimiter, but parent length sort of acts like one. For example:
    
      <xs:element name="Parent"
        dfdl:lengthKind="explicit" dfdl:length="4"
        dfdl:lengthUnits="bytes">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="Header" type="xs:hexBinary"
              dfdl:lengthKind="explicit" dfdl:length="1"
              dfdl:lengthUnits="bytes" />
            <xs:element name="UserData" type="xs:hexBinary"
              dfdl:lengthKind="delimited" dfdl:encoding="ISO-8859-1"/>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    
    So the parent element is 4 bytes and the header is 1 byte. If we parse
    the data:
    
      0xAA BB CC DD
    
    We get the following infoset
    
      <Parent>
        <Header>AA</Header>
        <UserData>BBCCDD</UserData>
      </Parent>
    
    And the UserData is the remaining three bytes. Using
    lengthKind="endOfParent" would probably have better performance if we
    implemented it, but this should give the same result for the hexBinary
    blob at the end.
    
    - Steve
    
    
    On 1/22/19 4:16 AM, Christofer Dutz wrote:
    > Hi all,
    > 
    > I am stuck with a little problem … I am reading a packet, which is usually contained inside another. Therefore it doesn’t provide any means of providing it’s length.
    > So the packet is just a small header + binary data … now I want to read “all the rest” after the header into a field “userData”.
    > In the DFDL documentation at IBM I could read that the lengthKind=”endOfParent” would be what I’m looking for.
    > 
    > Unfortunately this doesn’t seem to be supported … so how can I achieve the same with implemented options?
    > 
    > Chris
    > 
    
    


Re: How to achieve lengthKind=”endOfParent” without using endOfParent?

Posted by Steve Lawrence <sl...@apache.org>.
Correct, lengthKind="endOfParent" has not bee implemented yet.

As an alternative that we do support, you should be able to use
dfdl:lengthKind="delimited" for the hexBinary user data. In this case,
there's no delimiter, but parent length sort of acts like one. For example:

  <xs:element name="Parent"
    dfdl:lengthKind="explicit" dfdl:length="4"
    dfdl:lengthUnits="bytes">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Header" type="xs:hexBinary"
          dfdl:lengthKind="explicit" dfdl:length="1"
          dfdl:lengthUnits="bytes" />
        <xs:element name="UserData" type="xs:hexBinary"
          dfdl:lengthKind="delimited" dfdl:encoding="ISO-8859-1"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

So the parent element is 4 bytes and the header is 1 byte. If we parse
the data:

  0xAA BB CC DD

We get the following infoset

  <Parent>
    <Header>AA</Header>
    <UserData>BBCCDD</UserData>
  </Parent>

And the UserData is the remaining three bytes. Using
lengthKind="endOfParent" would probably have better performance if we
implemented it, but this should give the same result for the hexBinary
blob at the end.

- Steve


On 1/22/19 4:16 AM, Christofer Dutz wrote:
> Hi all,
> 
> I am stuck with a little problem … I am reading a packet, which is usually contained inside another. Therefore it doesn’t provide any means of providing it’s length.
> So the packet is just a small header + binary data … now I want to read “all the rest” after the header into a field “userData”.
> In the DFDL documentation at IBM I could read that the lengthKind=”endOfParent” would be what I’m looking for.
> 
> Unfortunately this doesn’t seem to be supported … so how can I achieve the same with implemented options?
> 
> Chris
>