You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@daffodil.apache.org by Steve Lawrence <sl...@apache.org> on 2021/01/05 19:26:25 UTC

The future of the daffodil DFDL schema debugger?

Now that we're in a new year, I'd like to start a discussion about the
Daffodil DFDL Schema debugger and how it might be improved to be more
useful.

Note that this is not the capabilities to debug Daffodil itself in
something like Eclipse/IntelliJ, but the ability for Daffodil to provide
enough extra information during a parse/unparse so that a schema
developer can get an idea of what Daffodil is doing. This makes it
easier for users (rather than developers) to determine why a schema
isn't giving the expect parse/unparse result (either because of bad data
or a faulty schema.

The current state of the debugger is enabled by providing the --debug or
--trace flags in the CLI. More information about that here:

https://daffodil.apache.org/debugger/

This enables a TUI and commands somewhat similar to GDB, providing thins
like breakpoints, steps, displaying the current infoset, display a dump
of the data, etc.

Although I find this tool pretty useful, it definitely has some glaring
issues.

The most glaring to me is that it really isn't useful at all for
debugging unparse. The data dumps only include then main outputstream,
so determine things like suspensions and buffered output is impossible.

Another issue is the infoset output. When outputting the infoset, the
debugger currently just walks the entire thing and converts it to XML
and displays the XML. For large infosets, this is excess and can make it
impossible to use, even with some configurations the limit how much of
that infoset is actually printed to the screen. Also things like large
hex binary blobs create excessive and unusable output.

Another thing I feel is missing is a schema view. Right now it's very
difficult to know where in the schema Daffodil actually is.

I think these issues just need some thought improvement. One could
imagine a better way to stringify our unparse buffers for debug. One
could image a way to receive infoset state changes so the debugger can
track things like backtracks and remove infosets. One could image a way
display the schema

We just need a better way to stringify the current state of the unparse
data including buffers, and we need a way to for the debugger to receive
state change information about infoset so it can update displays rather
than just constantly printing the entire infoset.

However, I think another other big issue is just usability in general. I
think the CLI usage is reasonable, but it's not always user friendly,
and is difficult to view multiple things at the same time. I think
because of this very few people even use this tool. So this this like
perhaps something worth focus.

My first thought to improving this usability issue would be to implement
the Debug Adapter Protocol (DAP)
(https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
which many IDE's implement. With this implemented, Daffodil could be
plugged in to any IDE that supports it and essentially get debugging for
free, without the need to worry about the GUI elements.

I do have concerns that this just wouldn't have enough functionality
that we'd really need. For example, DAP really only has ability show
code (Daffodil's equivalent is the DFDL schema). There isn't a way to
show a live view of the infoset or data. Most DAP IDE's do have a
console output, so we could potentially make it so the console output is
a live view of infoset/data. But I'm not even sure most DAP friendly
IDE's could support this kindof console output. Does anyone have
familiarity with DAP IDE's or and what kinds of console capabilities are
available?

I also looked into TUI libraries with the idea that we could just extend
our current debugger user interface to be a bit friendlier.
Unfortunately, there aren't too many Java/Scala TUI libraries and those
that do exist don't have Apache friendly licenses. We also want to be
careful about increase dependencies just for a debugger than many people
might not use, so large graphics libraries are probably out of the question.

This allo makes me wonder if an approach worth taking for the future of
Daffodil schema debugging is developing a sort of "Daffodil Debug
Protocol". I imagine it would be loosely based on DAP (which is
essentially JSON message based) but could be targeted to the things that
a DFDL schema debugger would really need. An added benefit with some
sort of protocol is the debugger interface can be uncoupled from
Daffodil itself, so we could implement a TUI/GUI/whatever in any
language/GUI framework and just have it communicate the protocol over
some form of IPC. Another benefit is that any future backends could
implement this protocol and so a single debugger could hook into
different backends without much issue. Unfortunately, defining such a
protocol might be a large task, but we do have our existing debug
infrastructure and things like DAP to guide its development/design.

Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps we
really just need the few improvements mentioned to the existing
debugger. Is that enough to make it usable? Or is an entirely different
approach needed to debugging schemas?

Re: The future of the daffodil DFDL schema debugger?

Posted by Steve Lawrence <sl...@apache.org>.
This has strayed a bit from a debugger and more to improved usability in
general. I think that's definitely worth a discussion since usability is
probably one of the biggest hindrances to Daffodil acceptance.

Continuing this, I've always thought an online DFDL "playground" or
"fiddle" could be really useful for new users. A webpage that lets you
add a schema, add some data and parse/unparse it and get the result
could be useful. Let's new users playaround with Daffodil without having
to install anything.

It could also have a button to create a TDML file to create TDML to be
included in Daffodil tests. Or it could have shareable links to the
fiddle itself so others could run it online. That could be a great way
for users to start discussions and easily share DFDL snippets/examples
without having to learn TDML or set up IDEs.

And bringing this back to the debugger, I imagine the playground/fiddle
interface would be somewhat similar to a debugger (e.g. infoset view,
schema view, data view, etc.) and could additionally have
debugger/history tree views as well. It also has the benefit of being a
webserver, so you could run the core backend on any machine, and then
the display just needs a browser. So it's not dependent any one GUI toolkit.

On 1/11/21 1:11 PM, Beckerle, Mike wrote:
> I'd like "report a bug" as an IDE-supported action that actually helps the user create a TDML file, annotate it with their commentary about the issue as well as just the raw schema and expected result, etc. Then upload or send it to the right place.
> 
> Similar but "ask a question" that is IDE supported that helps the user create a TDML file for discussion of their question would be very similar, but many times people just ask for clarifications, and little tests that make the context clear/concrete that are actually runnable save much time and back and forth over unspecified aspects of the situation.
> 
> 
> ________________________________
> From: John Wass <jw...@gmail.com>
> Sent: Friday, January 8, 2021 4:45 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: The future of the daffodil DFDL schema debugger?
> 
> What other features could find a nice home in an IDE integration?  Having
> single convenient entrypoint (the IDE) for such things would be nice, imo.
> 
> Things like...
> 
> - Rich set of actions for TDML
>   - Run a single test from a TDML file
>   - Debug/Run TDML
> - Run/Debug a data file with a schema from the project
>   - ie Right click on a JPG and have context menu for Run with Daffodil ->
> pick from list of dfdl.xsd
> ...
> 
> 
> 
> On Fri, Jan 8, 2021 at 2:47 PM Beckerle, Mike <mb...@owlcyberdefense.com>
> wrote:
> 
>> Use cases or quasi-requirements. This is my summary so far.
>>
>> 1) capture a human-readable trace of parse/unparse information to a single
>> text file (might be same as 2 if machine-readable is sufficiently human
>> readable)
>>
>> 2) capture a machine-readable trace of parse/unparse information to a
>> single text file (might be same as 1 if human readable form is also machine
>> readable)
>>
>> 3) interactive debug from a command line - each display of information is
>> requested by a specific command (1 and 2 above might be using this with a
>> specific canned set of commands auto-issued to display various information,
>> and capturing all to an output stream)
>>
>> 4) interactive debug with multi-panel display where displays are
>> updated/animated automatically as debug context changes. (This is intended
>> to mean more than just opening all the schema files in different editor
>> windows - more than just gdb-style debug under Emacs.)
>>
>> 5) interactive debug time-machine - ability to backup to prior
>> parser/unparser states, move forward again, or just backup and re-check
>> something, but then jump forward to proceed from where one left off.
>>
>> 6) Non Use Case: IDE for DFDL with rich semantic model (akin to the DSOM
>> object model) of the schema.
>> This is here just to point out that it's really out of scope. There are
>> many questions about the schema (e.g., "can I add this property to this
>> element?") that are not? required for the debugger. A full and powerful IDE
>> is great, but that's really entirely different than our goals for debugging
>> that we're trying to discuss here.
>>
>>
>> ________________________________
>> From: Sloane, Brandon <bs...@owlcyberdefense.com>
>> Sent: Thursday, January 7, 2021 1:25 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: The future of the daffodil DFDL schema debugger?
>>
>> We could also create a new flag for --trace that would format the trace
>> output in a more machine readable manner. This should let us accomplish
>> Larry's goals, and most of mine, with relativly little effort within
>> Daffodil (but still all the effort on the GUI side), and would allow for
>> off-site analysis in cases where it is not practical to attach a debugger
>> while Daffodil is running.
>> ________________________________
>> From: Sloane, Brandon <bs...@owlcyberdefense.com>
>> Sent: Thursday, January 7, 2021 1:21 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: Re: The future of the daffodil DFDL schema debugger?
>>
>> I've been thinking about a tool along similar lines (although more
>> integrated with Daffodil than post-processing the trace output).
>>
>> One thing to keep in mind is that, although the trace output is presented
>> as a linear log (since we do not have much choice), the actual process is
>> more of a tree, due to backtracking.
>>
>> Ideally, we would have a multi-pane window showing:
>>
>>
>>   *   The hex/binary data
>>   *   The infoset
>>   *   A time-axis parse tree; with a "major" node at every point of
>> uncertainty and parse error, and "minor" nodes at every parse step
>>   *   A view of the DFDL schema
>>   *   An interactive terminal debugger (e.g. what we currently have)
>>   *   Breakpoints/variables/delimeter-stack/etc
>>
>> Within these panes, you ough to be able to select a given region/element,
>> and highlight all the corresponding elements in the other panes.
>>
>> I think that exporting the nessasary information from Daffodil to
>> implement all of this would be relativly straightforward. The only
>> potentially problametic parts I see are:
>>
>>   *   The interactive debugger would require some form of time-travel to
>> implement (I think most of the work for this is done to support backracking)
>>   *   The memory requirements when used on large infosets
>>
>> ________________________________
>> From: Larry Barber <la...@nteligen.com>
>> Sent: Thursday, January 7, 2021 1:08 PM
>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> Subject: RE: The future of the daffodil DFDL schema debugger?
>>
>> When I was doing strange and unusual things with DFDL and generating a lot
>> of errors, I envisioned how helpful it would be to have a tool that would
>> post-process the --trace output and use it to display a dual pane window
>> (like the editor referenced below) with the schema on one side and hex
>> version on the other, with a slider that would allow be to flow through the
>> parsing action and see pointers as to where the parser was in both the
>> schema and input files. In other words just convert the information from
>> the -trace into a more useful graphical display.
>> Perhaps breakpoint like markers could be added to both files to quickly
>> scan through and display what sections of the schema read which locations
>> in the file, or vice versa.
>>
>> -----Original Message-----
>> From: Steve Lawrence [mailto:slawrence@apache.org]
>> Sent: Wednesday, January 6, 2021 1:42 PM
>> To: dev@daffodil.apache.org
>> Subject: Re: The future of the daffodil DFDL schema debugger?
>>
>> Yep, something like that seems very reasonable for dealing with large
>> infosets. But it still feels like we still run into usability issues.
>> For example, what if a user wants to see more? We need some configuration
>> options to increase what we've ellided. It's not big, but every new thing
>> that needs configuration adds complexity and decreases usability.
>>
>> And I think the only reason we are trying to spend effort elliding things
>> is because we're limited to this gdb-like interface where you can only
>> print out a little information at a time.
>>
>> I think what would really is to dump this gdb interface and instead use
>> multiple windows/views. As a really close example to what I imagine, I
>> recently came across this hex editor:
>>
>>
>> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.synalysis.net%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637455553366581733%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=B8YS4yJYrqhZ%2BoINnNDa%2BVCe77ZNjyiAEjvhdRLA%2BZY%3D&amp;reserved=0
>>
>> The screenshots are a bit small so it's not super clear, but this tool has
>> one view for the data in hex, and one view for a tree of parsed results
>> (which is very similar to our infoset). The "infoset" view has information
>> like offset/length/value, and can be related back to the data view to find
>> the actual bits.
>>
>> I imagine the "next generation daffodil debugger" to look much like this.
>> As data is parsed, the infoset view fills up. This view could act like a
>> standard GUI tree so you could collapse sections or scroll around to show
>> just the parts you care about, and have search capabilities to quickly jump
>> around. The advantage here is you no longer really need automated eliding
>> or heuristics for what the user *might* care about.
>> You just show the whole thing and let user scroll around. As daffodil
>> parses and backtracks, this tree grows or shrinks.
>>
>> I also imagine you could have a cursor moving around the hex view, so as
>> daffodil moves around (e.g. scanning for delimiters, extracting integers),
>> one could update this data view to show what daffodil is doing and where it
>> is.
>>
>> I also image there could be other views as well. For example, a schema
>> view to show where in the schema daffodil is, and to add/remove
>> breakpoints. And an information view for things like variables, in-scope
>> delimiters, PoU's, etc.
>>
>> The only reason I mention a debug protcol is that would allow this GUI to
>> be more easily written in something other that Java/Scala to take advantage
>> of other GUI toolkits. It's been a long while since I've done anything with
>> Java guis, but they seems pretty poor that last I looked at them. Would
>> even allow for a TUI, which Java has little/no support for. Also enables
>> things like remote deubgging if an socket IPC was used. Though I'm not sure
>> all of that is necessary. Just thinking what would be ideal, and it can
>> always be pared back.
>>
>>
>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
>>> I don't think of it as a daffodil debug protocol, but just a separation
>> of concerns between display of information and the behaviors of
>> parse/unparse that need to be points where users can pause, and data
>> structures available to display.
>>>
>>> E.g., it is 100% a display issue that the infoset (shown as XML) is
>> clumsy, too big, etc.  The infoset is available in the processor state, and
>> one can examine the current node, enclosing node, prior sibling(s),
>> following sibling(s), etc. One can elide contents that are too big for
>> hexBinary, etc.
>>>
>>> I think this problem, how to display the infoset with sensible limits on
>> sizing, is fairly easy to come up with some design for, that will at least
>> be (1) always fairly small (2) much more useful in more cases. It won't be
>> perfect but can be much better than what we do now.
>>>
>>> One sensible display "mode" should be that displaying the context
>>> surrounding the current element (when parsing or unparsing) displays
>>> at most N-lines. (N/2 before, N/2 after) with a maximum length of L
>>> characters (settable within reason ?)
>>>
>>> Sibling and enclosing nodes would be displayed eliding their contents to
>> at most 1 line.
>>>
>>> Here's an example of what I mean. Displaying up to M=10 lines total:
>>>
>>> ...
>>> <enclosingParent1>
>>>    ...
>>>    <priorSibling2>89ab782 ...</...>
>>>    <priorSibling1>some text is here and some more text</...>
>>>    <currentNode>value might be some big thing which needs to be elided
>> ...</...>
>>>    <followingSibling1> ... </...>
>>>    ???
>>> </enclosingParent1>
>>> ???
>>>
>>> The </...> is just an idea to reduce XML matching end-tag clutter.
>>>
>>> The ... on a line alone or where element content would appear generally
>> means 1 or more other siblings. The way the display above starts with ...
>> means that this is a relative inner nest, not starting from the absolute
>> root.
>>>
>>> The ... within simple content means that content is elided to fit on one
>> line. Always follows some text characters to differentiate from the
>> child-element context.
>>>
>>> The ??? means zero or more other siblings.
>>>
>>> I used bold italic above to point out that the current node would be
>> highlighted somehow. Probably a way to do this that doesn't require display
>> modes would be useful. E.g., a text marker like ">>>" as in:
>>>
>>>>>> <currentNode>value .... </...>
>>>
>>> might be better, particularly for a trace output being dumped to a text
>> file.
>>>
>>> I made the above example an unparser kind of example by showing a
>> following sibling that exists that is after the current node.
>>>
>>> I think the key concept is that any sibling node is displayed in a way
>> that fits on one line.
>>> E.g., even if the element name was really long, I'd suggest:
>>>
>>>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>>>
>>> Where the element name itself gets elided because it is too long.
>>>
>>> A thought. Note that the above presentation is shown as quasi-XML, but
>> there's nothing XML-specific about it. A JSON-friendly equivalent could be
>> done as well:
>>>
>>> enclosingParent1 = {
>>>    ...
>>>    priorSibling2 = "89ab782..."
>>>    priorSibling1 = "some text is here and some more text"
>>>    currentNode = "value might be some big thing which needs to be elided
>> ..."
>>>    followingSibling1 = { ... }
>>>    ???
>>> }
>>>
>>> That's enough for 1 email thread on this debug topic.
>>>
>>>
>>> ________________________________
>>> From: Steve Lawrence <sl...@apache.org>
>>> Sent: Tuesday, January 5, 2021 2:26 PM
>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> Subject: The future of the daffodil DFDL schema debugger?
>>>
>>>
>>> Now that we're in a new year, I'd like to start a discussion about the
>>> Daffodil DFDL Schema debugger and how it might be improved to be more
>>> useful.
>>>
>>> Note that this is not the capabilities to debug Daffodil itself in
>>> something like Eclipse/IntelliJ, but the ability for Daffodil to
>>> provide enough extra information during a parse/unparse so that a
>>> schema developer can get an idea of what Daffodil is doing. This makes
>>> it easier for users (rather than developers) to determine why a schema
>>> isn't giving the expect parse/unparse result (either because of bad
>>> data or a faulty schema.
>>>
>>> The current state of the debugger is enabled by providing the --debug
>>> or --trace flags in the CLI. More information about that here:
>>>
>>> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fdaf
>>> fodil.apache.org%2Fdebugger%2F&amp;data=04%7C01%7Clarry.barber%40nteli
>>> gen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047
>>> 675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
>>> C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;s
>>> data=eD1ut5aOb%2B2DlFhHL%2FJ5qcd9vMAVTv3EXJ5HdlAMD%2FM%3D&amp;reserved
>>> =0
>>>
>>> This enables a TUI and commands somewhat similar to GDB, providing
>>> thins like breakpoints, steps, displaying the current infoset, display
>>> a dump of the data, etc.
>>>
>>> Although I find this tool pretty useful, it definitely has some
>>> glaring issues.
>>>
>>> The most glaring to me is that it really isn't useful at all for
>>> debugging unparse. The data dumps only include then main outputstream,
>>> so determine things like suspensions and buffered output is impossible.
>>>
>>> Another issue is the infoset output. When outputting the infoset, the
>>> debugger currently just walks the entire thing and converts it to XML
>>> and displays the XML. For large infosets, this is excess and can make
>>> it impossible to use, even with some configurations the limit how much
>>> of that infoset is actually printed to the screen. Also things like
>>> large hex binary blobs create excessive and unusable output.
>>>
>>> Another thing I feel is missing is a schema view. Right now it's very
>>> difficult to know where in the schema Daffodil actually is.
>>>
>>> I think these issues just need some thought improvement. One could
>>> imagine a better way to stringify our unparse buffers for debug. One
>>> could image a way to receive infoset state changes so the debugger can
>>> track things like backtracks and remove infosets. One could image a
>>> way display the schema
>>>
>>> We just need a better way to stringify the current state of the
>>> unparse data including buffers, and we need a way to for the debugger
>>> to receive state change information about infoset so it can update
>>> displays rather than just constantly printing the entire infoset.
>>>
>>> However, I think another other big issue is just usability in general.
>>> I think the CLI usage is reasonable, but it's not always user
>>> friendly, and is difficult to view multiple things at the same time. I
>>> think because of this very few people even use this tool. So this this
>>> like perhaps something worth focus.
>>>
>>> My first thought to improving this usability issue would be to
>>> implement the Debug Adapter Protocol (DAP)
>>> (https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fmi
>>> crosoft.github.io%2Fdebug-adapter-protocol%2F&amp;data=04%7C01%7Clarry
>>> .barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c
>>>
>> 944e86a6062d047675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KLEXoeKVQWOlg6vg44NdWGU58CFSQkJDwSf94OnWbT0%3D&amp;reserved=0)
>> for Daffodil, which many IDE's implement. With this implemented, Daffodil
>> could be plugged in to any IDE that supports it and essentially get
>> debugging for free, without the need to worry about the GUI elements.
>>>
>>> I do have concerns that this just wouldn't have enough functionality
>>> that we'd really need. For example, DAP really only has ability show
>>> code (Daffodil's equivalent is the DFDL schema). There isn't a way to
>>> show a live view of the infoset or data. Most DAP IDE's do have a
>>> console output, so we could potentially make it so the console output
>>> is a live view of infoset/data. But I'm not even sure most DAP
>>> friendly IDE's could support this kindof console output. Does anyone
>>> have familiarity with DAP IDE's or and what kinds of console
>>> capabilities are available?
>>>
>>> I also looked into TUI libraries with the idea that we could just
>>> extend our current debugger user interface to be a bit friendlier.
>>> Unfortunately, there aren't too many Java/Scala TUI libraries and
>>> those that do exist don't have Apache friendly licenses. We also want
>>> to be careful about increase dependencies just for a debugger than
>>> many people might not use, so large graphics libraries are probably out
>> of the question.
>>>
>>> This allo makes me wonder if an approach worth taking for the future
>>> of Daffodil schema debugging is developing a sort of "Daffodil Debug
>>> Protocol". I imagine it would be loosely based on DAP (which is
>>> essentially JSON message based) but could be targeted to the things
>>> that a DFDL schema debugger would really need. An added benefit with
>>> some sort of protocol is the debugger interface can be uncoupled from
>>> Daffodil itself, so we could implement a TUI/GUI/whatever in any
>>> language/GUI framework and just have it communicate the protocol over
>>> some form of IPC. Another benefit is that any future backends could
>>> implement this protocol and so a single debugger could hook into
>>> different backends without much issue. Unfortunately, defining such a
>>> protocol might be a large task, but we do have our existing debug
>>> infrastructure and things like DAP to guide its development/design.
>>>
>>> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
>>> we really just need the few improvements mentioned to the existing
>>> debugger. Is that enough to make it usable? Or is an entirely
>>> different approach needed to debugging schemas?
>>>
>>
>>
> 


Re: The future of the daffodil DFDL schema debugger?

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
I'd like "report a bug" as an IDE-supported action that actually helps the user create a TDML file, annotate it with their commentary about the issue as well as just the raw schema and expected result, etc. Then upload or send it to the right place.

Similar but "ask a question" that is IDE supported that helps the user create a TDML file for discussion of their question would be very similar, but many times people just ask for clarifications, and little tests that make the context clear/concrete that are actually runnable save much time and back and forth over unspecified aspects of the situation.


________________________________
From: John Wass <jw...@gmail.com>
Sent: Friday, January 8, 2021 4:45 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: The future of the daffodil DFDL schema debugger?

What other features could find a nice home in an IDE integration?  Having
single convenient entrypoint (the IDE) for such things would be nice, imo.

Things like...

- Rich set of actions for TDML
  - Run a single test from a TDML file
  - Debug/Run TDML
- Run/Debug a data file with a schema from the project
  - ie Right click on a JPG and have context menu for Run with Daffodil ->
pick from list of dfdl.xsd
...



On Fri, Jan 8, 2021 at 2:47 PM Beckerle, Mike <mb...@owlcyberdefense.com>
wrote:

> Use cases or quasi-requirements. This is my summary so far.
>
> 1) capture a human-readable trace of parse/unparse information to a single
> text file (might be same as 2 if machine-readable is sufficiently human
> readable)
>
> 2) capture a machine-readable trace of parse/unparse information to a
> single text file (might be same as 1 if human readable form is also machine
> readable)
>
> 3) interactive debug from a command line - each display of information is
> requested by a specific command (1 and 2 above might be using this with a
> specific canned set of commands auto-issued to display various information,
> and capturing all to an output stream)
>
> 4) interactive debug with multi-panel display where displays are
> updated/animated automatically as debug context changes. (This is intended
> to mean more than just opening all the schema files in different editor
> windows - more than just gdb-style debug under Emacs.)
>
> 5) interactive debug time-machine - ability to backup to prior
> parser/unparser states, move forward again, or just backup and re-check
> something, but then jump forward to proceed from where one left off.
>
> 6) Non Use Case: IDE for DFDL with rich semantic model (akin to the DSOM
> object model) of the schema.
> This is here just to point out that it's really out of scope. There are
> many questions about the schema (e.g., "can I add this property to this
> element?") that are not? required for the debugger. A full and powerful IDE
> is great, but that's really entirely different than our goals for debugging
> that we're trying to discuss here.
>
>
> ________________________________
> From: Sloane, Brandon <bs...@owlcyberdefense.com>
> Sent: Thursday, January 7, 2021 1:25 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> We could also create a new flag for --trace that would format the trace
> output in a more machine readable manner. This should let us accomplish
> Larry's goals, and most of mine, with relativly little effort within
> Daffodil (but still all the effort on the GUI side), and would allow for
> off-site analysis in cases where it is not practical to attach a debugger
> while Daffodil is running.
> ________________________________
> From: Sloane, Brandon <bs...@owlcyberdefense.com>
> Sent: Thursday, January 7, 2021 1:21 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> I've been thinking about a tool along similar lines (although more
> integrated with Daffodil than post-processing the trace output).
>
> One thing to keep in mind is that, although the trace output is presented
> as a linear log (since we do not have much choice), the actual process is
> more of a tree, due to backtracking.
>
> Ideally, we would have a multi-pane window showing:
>
>
>   *   The hex/binary data
>   *   The infoset
>   *   A time-axis parse tree; with a "major" node at every point of
> uncertainty and parse error, and "minor" nodes at every parse step
>   *   A view of the DFDL schema
>   *   An interactive terminal debugger (e.g. what we currently have)
>   *   Breakpoints/variables/delimeter-stack/etc
>
> Within these panes, you ough to be able to select a given region/element,
> and highlight all the corresponding elements in the other panes.
>
> I think that exporting the nessasary information from Daffodil to
> implement all of this would be relativly straightforward. The only
> potentially problametic parts I see are:
>
>   *   The interactive debugger would require some form of time-travel to
> implement (I think most of the work for this is done to support backracking)
>   *   The memory requirements when used on large infosets
>
> ________________________________
> From: Larry Barber <la...@nteligen.com>
> Sent: Thursday, January 7, 2021 1:08 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: RE: The future of the daffodil DFDL schema debugger?
>
> When I was doing strange and unusual things with DFDL and generating a lot
> of errors, I envisioned how helpful it would be to have a tool that would
> post-process the --trace output and use it to display a dual pane window
> (like the editor referenced below) with the schema on one side and hex
> version on the other, with a slider that would allow be to flow through the
> parsing action and see pointers as to where the parser was in both the
> schema and input files. In other words just convert the information from
> the -trace into a more useful graphical display.
> Perhaps breakpoint like markers could be added to both files to quickly
> scan through and display what sections of the schema read which locations
> in the file, or vice versa.
>
> -----Original Message-----
> From: Steve Lawrence [mailto:slawrence@apache.org]
> Sent: Wednesday, January 6, 2021 1:42 PM
> To: dev@daffodil.apache.org
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> Yep, something like that seems very reasonable for dealing with large
> infosets. But it still feels like we still run into usability issues.
> For example, what if a user wants to see more? We need some configuration
> options to increase what we've ellided. It's not big, but every new thing
> that needs configuration adds complexity and decreases usability.
>
> And I think the only reason we are trying to spend effort elliding things
> is because we're limited to this gdb-like interface where you can only
> print out a little information at a time.
>
> I think what would really is to dump this gdb interface and instead use
> multiple windows/views. As a really close example to what I imagine, I
> recently came across this hex editor:
>
>
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.synalysis.net%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637455553366581733%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=B8YS4yJYrqhZ%2BoINnNDa%2BVCe77ZNjyiAEjvhdRLA%2BZY%3D&amp;reserved=0
>
> The screenshots are a bit small so it's not super clear, but this tool has
> one view for the data in hex, and one view for a tree of parsed results
> (which is very similar to our infoset). The "infoset" view has information
> like offset/length/value, and can be related back to the data view to find
> the actual bits.
>
> I imagine the "next generation daffodil debugger" to look much like this.
> As data is parsed, the infoset view fills up. This view could act like a
> standard GUI tree so you could collapse sections or scroll around to show
> just the parts you care about, and have search capabilities to quickly jump
> around. The advantage here is you no longer really need automated eliding
> or heuristics for what the user *might* care about.
> You just show the whole thing and let user scroll around. As daffodil
> parses and backtracks, this tree grows or shrinks.
>
> I also imagine you could have a cursor moving around the hex view, so as
> daffodil moves around (e.g. scanning for delimiters, extracting integers),
> one could update this data view to show what daffodil is doing and where it
> is.
>
> I also image there could be other views as well. For example, a schema
> view to show where in the schema daffodil is, and to add/remove
> breakpoints. And an information view for things like variables, in-scope
> delimiters, PoU's, etc.
>
> The only reason I mention a debug protcol is that would allow this GUI to
> be more easily written in something other that Java/Scala to take advantage
> of other GUI toolkits. It's been a long while since I've done anything with
> Java guis, but they seems pretty poor that last I looked at them. Would
> even allow for a TUI, which Java has little/no support for. Also enables
> things like remote deubgging if an socket IPC was used. Though I'm not sure
> all of that is necessary. Just thinking what would be ideal, and it can
> always be pared back.
>
>
> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> > I don't think of it as a daffodil debug protocol, but just a separation
> of concerns between display of information and the behaviors of
> parse/unparse that need to be points where users can pause, and data
> structures available to display.
> >
> > E.g., it is 100% a display issue that the infoset (shown as XML) is
> clumsy, too big, etc.  The infoset is available in the processor state, and
> one can examine the current node, enclosing node, prior sibling(s),
> following sibling(s), etc. One can elide contents that are too big for
> hexBinary, etc.
> >
> > I think this problem, how to display the infoset with sensible limits on
> sizing, is fairly easy to come up with some design for, that will at least
> be (1) always fairly small (2) much more useful in more cases. It won't be
> perfect but can be much better than what we do now.
> >
> > One sensible display "mode" should be that displaying the context
> > surrounding the current element (when parsing or unparsing) displays
> > at most N-lines. (N/2 before, N/2 after) with a maximum length of L
> > characters (settable within reason ?)
> >
> > Sibling and enclosing nodes would be displayed eliding their contents to
> at most 1 line.
> >
> > Here's an example of what I mean. Displaying up to M=10 lines total:
> >
> > ...
> > <enclosingParent1>
> >    ...
> >    <priorSibling2>89ab782 ...</...>
> >    <priorSibling1>some text is here and some more text</...>
> >    <currentNode>value might be some big thing which needs to be elided
> ...</...>
> >    <followingSibling1> ... </...>
> >    ???
> > </enclosingParent1>
> > ???
> >
> > The </...> is just an idea to reduce XML matching end-tag clutter.
> >
> > The ... on a line alone or where element content would appear generally
> means 1 or more other siblings. The way the display above starts with ...
> means that this is a relative inner nest, not starting from the absolute
> root.
> >
> > The ... within simple content means that content is elided to fit on one
> line. Always follows some text characters to differentiate from the
> child-element context.
> >
> > The ??? means zero or more other siblings.
> >
> > I used bold italic above to point out that the current node would be
> highlighted somehow. Probably a way to do this that doesn't require display
> modes would be useful. E.g., a text marker like ">>>" as in:
> >
> >>>> <currentNode>value .... </...>
> >
> > might be better, particularly for a trace output being dumped to a text
> file.
> >
> > I made the above example an unparser kind of example by showing a
> following sibling that exists that is after the current node.
> >
> > I think the key concept is that any sibling node is displayed in a way
> that fits on one line.
> > E.g., even if the element name was really long, I'd suggest:
> >
> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> >
> > Where the element name itself gets elided because it is too long.
> >
> > A thought. Note that the above presentation is shown as quasi-XML, but
> there's nothing XML-specific about it. A JSON-friendly equivalent could be
> done as well:
> >
> > enclosingParent1 = {
> >    ...
> >    priorSibling2 = "89ab782..."
> >    priorSibling1 = "some text is here and some more text"
> >    currentNode = "value might be some big thing which needs to be elided
> ..."
> >    followingSibling1 = { ... }
> >    ???
> > }
> >
> > That's enough for 1 email thread on this debug topic.
> >
> >
> > ________________________________
> > From: Steve Lawrence <sl...@apache.org>
> > Sent: Tuesday, January 5, 2021 2:26 PM
> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> > Subject: The future of the daffodil DFDL schema debugger?
> >
> >
> > Now that we're in a new year, I'd like to start a discussion about the
> > Daffodil DFDL Schema debugger and how it might be improved to be more
> > useful.
> >
> > Note that this is not the capabilities to debug Daffodil itself in
> > something like Eclipse/IntelliJ, but the ability for Daffodil to
> > provide enough extra information during a parse/unparse so that a
> > schema developer can get an idea of what Daffodil is doing. This makes
> > it easier for users (rather than developers) to determine why a schema
> > isn't giving the expect parse/unparse result (either because of bad
> > data or a faulty schema.
> >
> > The current state of the debugger is enabled by providing the --debug
> > or --trace flags in the CLI. More information about that here:
> >
> > https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fdaf
> > fodil.apache.org%2Fdebugger%2F&amp;data=04%7C01%7Clarry.barber%40nteli
> > gen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047
> > 675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
> > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;s
> > data=eD1ut5aOb%2B2DlFhHL%2FJ5qcd9vMAVTv3EXJ5HdlAMD%2FM%3D&amp;reserved
> > =0
> >
> > This enables a TUI and commands somewhat similar to GDB, providing
> > thins like breakpoints, steps, displaying the current infoset, display
> > a dump of the data, etc.
> >
> > Although I find this tool pretty useful, it definitely has some
> > glaring issues.
> >
> > The most glaring to me is that it really isn't useful at all for
> > debugging unparse. The data dumps only include then main outputstream,
> > so determine things like suspensions and buffered output is impossible.
> >
> > Another issue is the infoset output. When outputting the infoset, the
> > debugger currently just walks the entire thing and converts it to XML
> > and displays the XML. For large infosets, this is excess and can make
> > it impossible to use, even with some configurations the limit how much
> > of that infoset is actually printed to the screen. Also things like
> > large hex binary blobs create excessive and unusable output.
> >
> > Another thing I feel is missing is a schema view. Right now it's very
> > difficult to know where in the schema Daffodil actually is.
> >
> > I think these issues just need some thought improvement. One could
> > imagine a better way to stringify our unparse buffers for debug. One
> > could image a way to receive infoset state changes so the debugger can
> > track things like backtracks and remove infosets. One could image a
> > way display the schema
> >
> > We just need a better way to stringify the current state of the
> > unparse data including buffers, and we need a way to for the debugger
> > to receive state change information about infoset so it can update
> > displays rather than just constantly printing the entire infoset.
> >
> > However, I think another other big issue is just usability in general.
> > I think the CLI usage is reasonable, but it's not always user
> > friendly, and is difficult to view multiple things at the same time. I
> > think because of this very few people even use this tool. So this this
> > like perhaps something worth focus.
> >
> > My first thought to improving this usability issue would be to
> > implement the Debug Adapter Protocol (DAP)
> > (https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fmi
> > crosoft.github.io%2Fdebug-adapter-protocol%2F&amp;data=04%7C01%7Clarry
> > .barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c
> >
> 944e86a6062d047675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KLEXoeKVQWOlg6vg44NdWGU58CFSQkJDwSf94OnWbT0%3D&amp;reserved=0)
> for Daffodil, which many IDE's implement. With this implemented, Daffodil
> could be plugged in to any IDE that supports it and essentially get
> debugging for free, without the need to worry about the GUI elements.
> >
> > I do have concerns that this just wouldn't have enough functionality
> > that we'd really need. For example, DAP really only has ability show
> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
> > show a live view of the infoset or data. Most DAP IDE's do have a
> > console output, so we could potentially make it so the console output
> > is a live view of infoset/data. But I'm not even sure most DAP
> > friendly IDE's could support this kindof console output. Does anyone
> > have familiarity with DAP IDE's or and what kinds of console
> > capabilities are available?
> >
> > I also looked into TUI libraries with the idea that we could just
> > extend our current debugger user interface to be a bit friendlier.
> > Unfortunately, there aren't too many Java/Scala TUI libraries and
> > those that do exist don't have Apache friendly licenses. We also want
> > to be careful about increase dependencies just for a debugger than
> > many people might not use, so large graphics libraries are probably out
> of the question.
> >
> > This allo makes me wonder if an approach worth taking for the future
> > of Daffodil schema debugging is developing a sort of "Daffodil Debug
> > Protocol". I imagine it would be loosely based on DAP (which is
> > essentially JSON message based) but could be targeted to the things
> > that a DFDL schema debugger would really need. An added benefit with
> > some sort of protocol is the debugger interface can be uncoupled from
> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
> > language/GUI framework and just have it communicate the protocol over
> > some form of IPC. Another benefit is that any future backends could
> > implement this protocol and so a single debugger could hook into
> > different backends without much issue. Unfortunately, defining such a
> > protocol might be a large task, but we do have our existing debug
> > infrastructure and things like DAP to guide its development/design.
> >
> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> > we really just need the few improvements mentioned to the existing
> > debugger. Is that enough to make it usable? Or is an entirely
> > different approach needed to debugging schemas?
> >
>
>

Re: The future of the daffodil DFDL schema debugger?

Posted by John Wass <jw...@gmail.com>.
What other features could find a nice home in an IDE integration?  Having
single convenient entrypoint (the IDE) for such things would be nice, imo.

Things like...

- Rich set of actions for TDML
  - Run a single test from a TDML file
  - Debug/Run TDML
- Run/Debug a data file with a schema from the project
  - ie Right click on a JPG and have context menu for Run with Daffodil ->
pick from list of dfdl.xsd
...



On Fri, Jan 8, 2021 at 2:47 PM Beckerle, Mike <mb...@owlcyberdefense.com>
wrote:

> Use cases or quasi-requirements. This is my summary so far.
>
> 1) capture a human-readable trace of parse/unparse information to a single
> text file (might be same as 2 if machine-readable is sufficiently human
> readable)
>
> 2) capture a machine-readable trace of parse/unparse information to a
> single text file (might be same as 1 if human readable form is also machine
> readable)
>
> 3) interactive debug from a command line - each display of information is
> requested by a specific command (1 and 2 above might be using this with a
> specific canned set of commands auto-issued to display various information,
> and capturing all to an output stream)
>
> 4) interactive debug with multi-panel display where displays are
> updated/animated automatically as debug context changes. (This is intended
> to mean more than just opening all the schema files in different editor
> windows - more than just gdb-style debug under Emacs.)
>
> 5) interactive debug time-machine - ability to backup to prior
> parser/unparser states, move forward again, or just backup and re-check
> something, but then jump forward to proceed from where one left off.
>
> 6) Non Use Case: IDE for DFDL with rich semantic model (akin to the DSOM
> object model) of the schema.
> This is here just to point out that it's really out of scope. There are
> many questions about the schema (e.g., "can I add this property to this
> element?") that are not? required for the debugger. A full and powerful IDE
> is great, but that's really entirely different than our goals for debugging
> that we're trying to discuss here.
>
>
> ________________________________
> From: Sloane, Brandon <bs...@owlcyberdefense.com>
> Sent: Thursday, January 7, 2021 1:25 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> We could also create a new flag for --trace that would format the trace
> output in a more machine readable manner. This should let us accomplish
> Larry's goals, and most of mine, with relativly little effort within
> Daffodil (but still all the effort on the GUI side), and would allow for
> off-site analysis in cases where it is not practical to attach a debugger
> while Daffodil is running.
> ________________________________
> From: Sloane, Brandon <bs...@owlcyberdefense.com>
> Sent: Thursday, January 7, 2021 1:21 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> I've been thinking about a tool along similar lines (although more
> integrated with Daffodil than post-processing the trace output).
>
> One thing to keep in mind is that, although the trace output is presented
> as a linear log (since we do not have much choice), the actual process is
> more of a tree, due to backtracking.
>
> Ideally, we would have a multi-pane window showing:
>
>
>   *   The hex/binary data
>   *   The infoset
>   *   A time-axis parse tree; with a "major" node at every point of
> uncertainty and parse error, and "minor" nodes at every parse step
>   *   A view of the DFDL schema
>   *   An interactive terminal debugger (e.g. what we currently have)
>   *   Breakpoints/variables/delimeter-stack/etc
>
> Within these panes, you ough to be able to select a given region/element,
> and highlight all the corresponding elements in the other panes.
>
> I think that exporting the nessasary information from Daffodil to
> implement all of this would be relativly straightforward. The only
> potentially problametic parts I see are:
>
>   *   The interactive debugger would require some form of time-travel to
> implement (I think most of the work for this is done to support backracking)
>   *   The memory requirements when used on large infosets
>
> ________________________________
> From: Larry Barber <la...@nteligen.com>
> Sent: Thursday, January 7, 2021 1:08 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: RE: The future of the daffodil DFDL schema debugger?
>
> When I was doing strange and unusual things with DFDL and generating a lot
> of errors, I envisioned how helpful it would be to have a tool that would
> post-process the --trace output and use it to display a dual pane window
> (like the editor referenced below) with the schema on one side and hex
> version on the other, with a slider that would allow be to flow through the
> parsing action and see pointers as to where the parser was in both the
> schema and input files. In other words just convert the information from
> the -trace into a more useful graphical display.
> Perhaps breakpoint like markers could be added to both files to quickly
> scan through and display what sections of the schema read which locations
> in the file, or vice versa.
>
> -----Original Message-----
> From: Steve Lawrence [mailto:slawrence@apache.org]
> Sent: Wednesday, January 6, 2021 1:42 PM
> To: dev@daffodil.apache.org
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> Yep, something like that seems very reasonable for dealing with large
> infosets. But it still feels like we still run into usability issues.
> For example, what if a user wants to see more? We need some configuration
> options to increase what we've ellided. It's not big, but every new thing
> that needs configuration adds complexity and decreases usability.
>
> And I think the only reason we are trying to spend effort elliding things
> is because we're limited to this gdb-like interface where you can only
> print out a little information at a time.
>
> I think what would really is to dump this gdb interface and instead use
> multiple windows/views. As a really close example to what I imagine, I
> recently came across this hex editor:
>
>
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.synalysis.net%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637455553366581733%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=B8YS4yJYrqhZ%2BoINnNDa%2BVCe77ZNjyiAEjvhdRLA%2BZY%3D&amp;reserved=0
>
> The screenshots are a bit small so it's not super clear, but this tool has
> one view for the data in hex, and one view for a tree of parsed results
> (which is very similar to our infoset). The "infoset" view has information
> like offset/length/value, and can be related back to the data view to find
> the actual bits.
>
> I imagine the "next generation daffodil debugger" to look much like this.
> As data is parsed, the infoset view fills up. This view could act like a
> standard GUI tree so you could collapse sections or scroll around to show
> just the parts you care about, and have search capabilities to quickly jump
> around. The advantage here is you no longer really need automated eliding
> or heuristics for what the user *might* care about.
> You just show the whole thing and let user scroll around. As daffodil
> parses and backtracks, this tree grows or shrinks.
>
> I also imagine you could have a cursor moving around the hex view, so as
> daffodil moves around (e.g. scanning for delimiters, extracting integers),
> one could update this data view to show what daffodil is doing and where it
> is.
>
> I also image there could be other views as well. For example, a schema
> view to show where in the schema daffodil is, and to add/remove
> breakpoints. And an information view for things like variables, in-scope
> delimiters, PoU's, etc.
>
> The only reason I mention a debug protcol is that would allow this GUI to
> be more easily written in something other that Java/Scala to take advantage
> of other GUI toolkits. It's been a long while since I've done anything with
> Java guis, but they seems pretty poor that last I looked at them. Would
> even allow for a TUI, which Java has little/no support for. Also enables
> things like remote deubgging if an socket IPC was used. Though I'm not sure
> all of that is necessary. Just thinking what would be ideal, and it can
> always be pared back.
>
>
> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> > I don't think of it as a daffodil debug protocol, but just a separation
> of concerns between display of information and the behaviors of
> parse/unparse that need to be points where users can pause, and data
> structures available to display.
> >
> > E.g., it is 100% a display issue that the infoset (shown as XML) is
> clumsy, too big, etc.  The infoset is available in the processor state, and
> one can examine the current node, enclosing node, prior sibling(s),
> following sibling(s), etc. One can elide contents that are too big for
> hexBinary, etc.
> >
> > I think this problem, how to display the infoset with sensible limits on
> sizing, is fairly easy to come up with some design for, that will at least
> be (1) always fairly small (2) much more useful in more cases. It won't be
> perfect but can be much better than what we do now.
> >
> > One sensible display "mode" should be that displaying the context
> > surrounding the current element (when parsing or unparsing) displays
> > at most N-lines. (N/2 before, N/2 after) with a maximum length of L
> > characters (settable within reason ?)
> >
> > Sibling and enclosing nodes would be displayed eliding their contents to
> at most 1 line.
> >
> > Here's an example of what I mean. Displaying up to M=10 lines total:
> >
> > ...
> > <enclosingParent1>
> >    ...
> >    <priorSibling2>89ab782 ...</...>
> >    <priorSibling1>some text is here and some more text</...>
> >    <currentNode>value might be some big thing which needs to be elided
> ...</...>
> >    <followingSibling1> ... </...>
> >    ???
> > </enclosingParent1>
> > ???
> >
> > The </...> is just an idea to reduce XML matching end-tag clutter.
> >
> > The ... on a line alone or where element content would appear generally
> means 1 or more other siblings. The way the display above starts with ...
> means that this is a relative inner nest, not starting from the absolute
> root.
> >
> > The ... within simple content means that content is elided to fit on one
> line. Always follows some text characters to differentiate from the
> child-element context.
> >
> > The ??? means zero or more other siblings.
> >
> > I used bold italic above to point out that the current node would be
> highlighted somehow. Probably a way to do this that doesn't require display
> modes would be useful. E.g., a text marker like ">>>" as in:
> >
> >>>> <currentNode>value .... </...>
> >
> > might be better, particularly for a trace output being dumped to a text
> file.
> >
> > I made the above example an unparser kind of example by showing a
> following sibling that exists that is after the current node.
> >
> > I think the key concept is that any sibling node is displayed in a way
> that fits on one line.
> > E.g., even if the element name was really long, I'd suggest:
> >
> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> >
> > Where the element name itself gets elided because it is too long.
> >
> > A thought. Note that the above presentation is shown as quasi-XML, but
> there's nothing XML-specific about it. A JSON-friendly equivalent could be
> done as well:
> >
> > enclosingParent1 = {
> >    ...
> >    priorSibling2 = "89ab782..."
> >    priorSibling1 = "some text is here and some more text"
> >    currentNode = "value might be some big thing which needs to be elided
> ..."
> >    followingSibling1 = { ... }
> >    ???
> > }
> >
> > That's enough for 1 email thread on this debug topic.
> >
> >
> > ________________________________
> > From: Steve Lawrence <sl...@apache.org>
> > Sent: Tuesday, January 5, 2021 2:26 PM
> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> > Subject: The future of the daffodil DFDL schema debugger?
> >
> >
> > Now that we're in a new year, I'd like to start a discussion about the
> > Daffodil DFDL Schema debugger and how it might be improved to be more
> > useful.
> >
> > Note that this is not the capabilities to debug Daffodil itself in
> > something like Eclipse/IntelliJ, but the ability for Daffodil to
> > provide enough extra information during a parse/unparse so that a
> > schema developer can get an idea of what Daffodil is doing. This makes
> > it easier for users (rather than developers) to determine why a schema
> > isn't giving the expect parse/unparse result (either because of bad
> > data or a faulty schema.
> >
> > The current state of the debugger is enabled by providing the --debug
> > or --trace flags in the CLI. More information about that here:
> >
> > https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fdaf
> > fodil.apache.org%2Fdebugger%2F&amp;data=04%7C01%7Clarry.barber%40nteli
> > gen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047
> > 675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
> > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;s
> > data=eD1ut5aOb%2B2DlFhHL%2FJ5qcd9vMAVTv3EXJ5HdlAMD%2FM%3D&amp;reserved
> > =0
> >
> > This enables a TUI and commands somewhat similar to GDB, providing
> > thins like breakpoints, steps, displaying the current infoset, display
> > a dump of the data, etc.
> >
> > Although I find this tool pretty useful, it definitely has some
> > glaring issues.
> >
> > The most glaring to me is that it really isn't useful at all for
> > debugging unparse. The data dumps only include then main outputstream,
> > so determine things like suspensions and buffered output is impossible.
> >
> > Another issue is the infoset output. When outputting the infoset, the
> > debugger currently just walks the entire thing and converts it to XML
> > and displays the XML. For large infosets, this is excess and can make
> > it impossible to use, even with some configurations the limit how much
> > of that infoset is actually printed to the screen. Also things like
> > large hex binary blobs create excessive and unusable output.
> >
> > Another thing I feel is missing is a schema view. Right now it's very
> > difficult to know where in the schema Daffodil actually is.
> >
> > I think these issues just need some thought improvement. One could
> > imagine a better way to stringify our unparse buffers for debug. One
> > could image a way to receive infoset state changes so the debugger can
> > track things like backtracks and remove infosets. One could image a
> > way display the schema
> >
> > We just need a better way to stringify the current state of the
> > unparse data including buffers, and we need a way to for the debugger
> > to receive state change information about infoset so it can update
> > displays rather than just constantly printing the entire infoset.
> >
> > However, I think another other big issue is just usability in general.
> > I think the CLI usage is reasonable, but it's not always user
> > friendly, and is difficult to view multiple things at the same time. I
> > think because of this very few people even use this tool. So this this
> > like perhaps something worth focus.
> >
> > My first thought to improving this usability issue would be to
> > implement the Debug Adapter Protocol (DAP)
> > (https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fmi
> > crosoft.github.io%2Fdebug-adapter-protocol%2F&amp;data=04%7C01%7Clarry
> > .barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c
> >
> 944e86a6062d047675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KLEXoeKVQWOlg6vg44NdWGU58CFSQkJDwSf94OnWbT0%3D&amp;reserved=0)
> for Daffodil, which many IDE's implement. With this implemented, Daffodil
> could be plugged in to any IDE that supports it and essentially get
> debugging for free, without the need to worry about the GUI elements.
> >
> > I do have concerns that this just wouldn't have enough functionality
> > that we'd really need. For example, DAP really only has ability show
> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
> > show a live view of the infoset or data. Most DAP IDE's do have a
> > console output, so we could potentially make it so the console output
> > is a live view of infoset/data. But I'm not even sure most DAP
> > friendly IDE's could support this kindof console output. Does anyone
> > have familiarity with DAP IDE's or and what kinds of console
> > capabilities are available?
> >
> > I also looked into TUI libraries with the idea that we could just
> > extend our current debugger user interface to be a bit friendlier.
> > Unfortunately, there aren't too many Java/Scala TUI libraries and
> > those that do exist don't have Apache friendly licenses. We also want
> > to be careful about increase dependencies just for a debugger than
> > many people might not use, so large graphics libraries are probably out
> of the question.
> >
> > This allo makes me wonder if an approach worth taking for the future
> > of Daffodil schema debugging is developing a sort of "Daffodil Debug
> > Protocol". I imagine it would be loosely based on DAP (which is
> > essentially JSON message based) but could be targeted to the things
> > that a DFDL schema debugger would really need. An added benefit with
> > some sort of protocol is the debugger interface can be uncoupled from
> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
> > language/GUI framework and just have it communicate the protocol over
> > some form of IPC. Another benefit is that any future backends could
> > implement this protocol and so a single debugger could hook into
> > different backends without much issue. Unfortunately, defining such a
> > protocol might be a large task, but we do have our existing debug
> > infrastructure and things like DAP to guide its development/design.
> >
> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> > we really just need the few improvements mentioned to the existing
> > debugger. Is that enough to make it usable? Or is an entirely
> > different approach needed to debugging schemas?
> >
>
>

Re: The future of the daffodil DFDL schema debugger?

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
Use cases or quasi-requirements. This is my summary so far.

1) capture a human-readable trace of parse/unparse information to a single text file (might be same as 2 if machine-readable is sufficiently human readable)

2) capture a machine-readable trace of parse/unparse information to a single text file (might be same as 1 if human readable form is also machine readable)

3) interactive debug from a command line - each display of information is requested by a specific command (1 and 2 above might be using this with a specific canned set of commands auto-issued to display various information, and capturing all to an output stream)

4) interactive debug with multi-panel display where displays are updated/animated automatically as debug context changes. (This is intended to mean more than just opening all the schema files in different editor windows - more than just gdb-style debug under Emacs.)

5) interactive debug time-machine - ability to backup to prior parser/unparser states, move forward again, or just backup and re-check something, but then jump forward to proceed from where one left off.

6) Non Use Case: IDE for DFDL with rich semantic model (akin to the DSOM object model) of the schema.
This is here just to point out that it's really out of scope. There are many questions about the schema (e.g., "can I add this property to this element?") that are not? required for the debugger. A full and powerful IDE is great, but that's really entirely different than our goals for debugging that we're trying to discuss here.


________________________________
From: Sloane, Brandon <bs...@owlcyberdefense.com>
Sent: Thursday, January 7, 2021 1:25 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: The future of the daffodil DFDL schema debugger?

We could also create a new flag for --trace that would format the trace output in a more machine readable manner. This should let us accomplish Larry's goals, and most of mine, with relativly little effort within Daffodil (but still all the effort on the GUI side), and would allow for off-site analysis in cases where it is not practical to attach a debugger while Daffodil is running.
________________________________
From: Sloane, Brandon <bs...@owlcyberdefense.com>
Sent: Thursday, January 7, 2021 1:21 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: The future of the daffodil DFDL schema debugger?

I've been thinking about a tool along similar lines (although more integrated with Daffodil than post-processing the trace output).

One thing to keep in mind is that, although the trace output is presented as a linear log (since we do not have much choice), the actual process is more of a tree, due to backtracking.

Ideally, we would have a multi-pane window showing:


  *   The hex/binary data
  *   The infoset
  *   A time-axis parse tree; with a "major" node at every point of uncertainty and parse error, and "minor" nodes at every parse step
  *   A view of the DFDL schema
  *   An interactive terminal debugger (e.g. what we currently have)
  *   Breakpoints/variables/delimeter-stack/etc

Within these panes, you ough to be able to select a given region/element, and highlight all the corresponding elements in the other panes.

I think that exporting the nessasary information from Daffodil to implement all of this would be relativly straightforward. The only potentially problametic parts I see are:

  *   The interactive debugger would require some form of time-travel to implement (I think most of the work for this is done to support backracking)
  *   The memory requirements when used on large infosets

________________________________
From: Larry Barber <la...@nteligen.com>
Sent: Thursday, January 7, 2021 1:08 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: RE: The future of the daffodil DFDL schema debugger?

When I was doing strange and unusual things with DFDL and generating a lot of errors, I envisioned how helpful it would be to have a tool that would post-process the --trace output and use it to display a dual pane window (like the editor referenced below) with the schema on one side and hex version on the other, with a slider that would allow be to flow through the parsing action and see pointers as to where the parser was in both the schema and input files. In other words just convert the information from the -trace into a more useful graphical display.
Perhaps breakpoint like markers could be added to both files to quickly scan through and display what sections of the schema read which locations in the file, or vice versa.

-----Original Message-----
From: Steve Lawrence [mailto:slawrence@apache.org]
Sent: Wednesday, January 6, 2021 1:42 PM
To: dev@daffodil.apache.org
Subject: Re: The future of the daffodil DFDL schema debugger?

Yep, something like that seems very reasonable for dealing with large infosets. But it still feels like we still run into usability issues.
For example, what if a user wants to see more? We need some configuration options to increase what we've ellided. It's not big, but every new thing that needs configuration adds complexity and decreases usability.

And I think the only reason we are trying to spend effort elliding things is because we're limited to this gdb-like interface where you can only print out a little information at a time.

I think what would really is to dump this gdb interface and instead use multiple windows/views. As a really close example to what I imagine, I recently came across this hex editor:

https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.synalysis.net%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637455553366581733%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=B8YS4yJYrqhZ%2BoINnNDa%2BVCe77ZNjyiAEjvhdRLA%2BZY%3D&amp;reserved=0

The screenshots are a bit small so it's not super clear, but this tool has one view for the data in hex, and one view for a tree of parsed results (which is very similar to our infoset). The "infoset" view has information like offset/length/value, and can be related back to the data view to find the actual bits.

I imagine the "next generation daffodil debugger" to look much like this. As data is parsed, the infoset view fills up. This view could act like a standard GUI tree so you could collapse sections or scroll around to show just the parts you care about, and have search capabilities to quickly jump around. The advantage here is you no longer really need automated eliding or heuristics for what the user *might* care about.
You just show the whole thing and let user scroll around. As daffodil parses and backtracks, this tree grows or shrinks.

I also imagine you could have a cursor moving around the hex view, so as daffodil moves around (e.g. scanning for delimiters, extracting integers), one could update this data view to show what daffodil is doing and where it is.

I also image there could be other views as well. For example, a schema view to show where in the schema daffodil is, and to add/remove breakpoints. And an information view for things like variables, in-scope delimiters, PoU's, etc.

The only reason I mention a debug protcol is that would allow this GUI to be more easily written in something other that Java/Scala to take advantage of other GUI toolkits. It's been a long while since I've done anything with Java guis, but they seems pretty poor that last I looked at them. Would even allow for a TUI, which Java has little/no support for. Also enables things like remote deubgging if an socket IPC was used. Though I'm not sure all of that is necessary. Just thinking what would be ideal, and it can always be pared back.


On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> I don't think of it as a daffodil debug protocol, but just a separation of concerns between display of information and the behaviors of parse/unparse that need to be points where users can pause, and data structures available to display.
>
> E.g., it is 100% a display issue that the infoset (shown as XML) is clumsy, too big, etc.  The infoset is available in the processor state, and one can examine the current node, enclosing node, prior sibling(s), following sibling(s), etc. One can elide contents that are too big for hexBinary, etc.
>
> I think this problem, how to display the infoset with sensible limits on sizing, is fairly easy to come up with some design for, that will at least be (1) always fairly small (2) much more useful in more cases. It won't be perfect but can be much better than what we do now.
>
> One sensible display "mode" should be that displaying the context
> surrounding the current element (when parsing or unparsing) displays
> at most N-lines. (N/2 before, N/2 after) with a maximum length of L
> characters (settable within reason ?)
>
> Sibling and enclosing nodes would be displayed eliding their contents to at most 1 line.
>
> Here's an example of what I mean. Displaying up to M=10 lines total:
>
> ...
> <enclosingParent1>
>    ...
>    <priorSibling2>89ab782 ...</...>
>    <priorSibling1>some text is here and some more text</...>
>    <currentNode>value might be some big thing which needs to be elided ...</...>
>    <followingSibling1> ... </...>
>    ???
> </enclosingParent1>
> ???
>
> The </...> is just an idea to reduce XML matching end-tag clutter.
>
> The ... on a line alone or where element content would appear generally means 1 or more other siblings. The way the display above starts with ... means that this is a relative inner nest, not starting from the absolute root.
>
> The ... within simple content means that content is elided to fit on one line. Always follows some text characters to differentiate from the child-element context.
>
> The ??? means zero or more other siblings.
>
> I used bold italic above to point out that the current node would be highlighted somehow. Probably a way to do this that doesn't require display modes would be useful. E.g., a text marker like ">>>" as in:
>
>>>> <currentNode>value .... </...>
>
> might be better, particularly for a trace output being dumped to a text file.
>
> I made the above example an unparser kind of example by showing a following sibling that exists that is after the current node.
>
> I think the key concept is that any sibling node is displayed in a way that fits on one line.
> E.g., even if the element name was really long, I'd suggest:
>
>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>
> Where the element name itself gets elided because it is too long.
>
> A thought. Note that the above presentation is shown as quasi-XML, but there's nothing XML-specific about it. A JSON-friendly equivalent could be done as well:
>
> enclosingParent1 = {
>    ...
>    priorSibling2 = "89ab782..."
>    priorSibling1 = "some text is here and some more text"
>    currentNode = "value might be some big thing which needs to be elided ..."
>    followingSibling1 = { ... }
>    ???
> }
>
> That's enough for 1 email thread on this debug topic.
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Tuesday, January 5, 2021 2:26 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: The future of the daffodil DFDL schema debugger?
>
>
> Now that we're in a new year, I'd like to start a discussion about the
> Daffodil DFDL Schema debugger and how it might be improved to be more
> useful.
>
> Note that this is not the capabilities to debug Daffodil itself in
> something like Eclipse/IntelliJ, but the ability for Daffodil to
> provide enough extra information during a parse/unparse so that a
> schema developer can get an idea of what Daffodil is doing. This makes
> it easier for users (rather than developers) to determine why a schema
> isn't giving the expect parse/unparse result (either because of bad
> data or a faulty schema.
>
> The current state of the debugger is enabled by providing the --debug
> or --trace flags in the CLI. More information about that here:
>
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fdaf
> fodil.apache.org%2Fdebugger%2F&amp;data=04%7C01%7Clarry.barber%40nteli
> gen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047
> 675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
> C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;s
> data=eD1ut5aOb%2B2DlFhHL%2FJ5qcd9vMAVTv3EXJ5HdlAMD%2FM%3D&amp;reserved
> =0
>
> This enables a TUI and commands somewhat similar to GDB, providing
> thins like breakpoints, steps, displaying the current infoset, display
> a dump of the data, etc.
>
> Although I find this tool pretty useful, it definitely has some
> glaring issues.
>
> The most glaring to me is that it really isn't useful at all for
> debugging unparse. The data dumps only include then main outputstream,
> so determine things like suspensions and buffered output is impossible.
>
> Another issue is the infoset output. When outputting the infoset, the
> debugger currently just walks the entire thing and converts it to XML
> and displays the XML. For large infosets, this is excess and can make
> it impossible to use, even with some configurations the limit how much
> of that infoset is actually printed to the screen. Also things like
> large hex binary blobs create excessive and unusable output.
>
> Another thing I feel is missing is a schema view. Right now it's very
> difficult to know where in the schema Daffodil actually is.
>
> I think these issues just need some thought improvement. One could
> imagine a better way to stringify our unparse buffers for debug. One
> could image a way to receive infoset state changes so the debugger can
> track things like backtracks and remove infosets. One could image a
> way display the schema
>
> We just need a better way to stringify the current state of the
> unparse data including buffers, and we need a way to for the debugger
> to receive state change information about infoset so it can update
> displays rather than just constantly printing the entire infoset.
>
> However, I think another other big issue is just usability in general.
> I think the CLI usage is reasonable, but it's not always user
> friendly, and is difficult to view multiple things at the same time. I
> think because of this very few people even use this tool. So this this
> like perhaps something worth focus.
>
> My first thought to improving this usability issue would be to
> implement the Debug Adapter Protocol (DAP)
> (https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fmi
> crosoft.github.io%2Fdebug-adapter-protocol%2F&amp;data=04%7C01%7Clarry
> .barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c
> 944e86a6062d047675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KLEXoeKVQWOlg6vg44NdWGU58CFSQkJDwSf94OnWbT0%3D&amp;reserved=0) for Daffodil, which many IDE's implement. With this implemented, Daffodil could be plugged in to any IDE that supports it and essentially get debugging for free, without the need to worry about the GUI elements.
>
> I do have concerns that this just wouldn't have enough functionality
> that we'd really need. For example, DAP really only has ability show
> code (Daffodil's equivalent is the DFDL schema). There isn't a way to
> show a live view of the infoset or data. Most DAP IDE's do have a
> console output, so we could potentially make it so the console output
> is a live view of infoset/data. But I'm not even sure most DAP
> friendly IDE's could support this kindof console output. Does anyone
> have familiarity with DAP IDE's or and what kinds of console
> capabilities are available?
>
> I also looked into TUI libraries with the idea that we could just
> extend our current debugger user interface to be a bit friendlier.
> Unfortunately, there aren't too many Java/Scala TUI libraries and
> those that do exist don't have Apache friendly licenses. We also want
> to be careful about increase dependencies just for a debugger than
> many people might not use, so large graphics libraries are probably out of the question.
>
> This allo makes me wonder if an approach worth taking for the future
> of Daffodil schema debugging is developing a sort of "Daffodil Debug
> Protocol". I imagine it would be loosely based on DAP (which is
> essentially JSON message based) but could be targeted to the things
> that a DFDL schema debugger would really need. An added benefit with
> some sort of protocol is the debugger interface can be uncoupled from
> Daffodil itself, so we could implement a TUI/GUI/whatever in any
> language/GUI framework and just have it communicate the protocol over
> some form of IPC. Another benefit is that any future backends could
> implement this protocol and so a single debugger could hook into
> different backends without much issue. Unfortunately, defining such a
> protocol might be a large task, but we do have our existing debug
> infrastructure and things like DAP to guide its development/design.
>
> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> we really just need the few improvements mentioned to the existing
> debugger. Is that enough to make it usable? Or is an entirely
> different approach needed to debugging schemas?
>


Re: The future of the daffodil DFDL schema debugger?

Posted by "Sloane, Brandon" <bs...@owlcyberdefense.com>.
We could also create a new flag for --trace that would format the trace output in a more machine readable manner. This should let us accomplish Larry's goals, and most of mine, with relativly little effort within Daffodil (but still all the effort on the GUI side), and would allow for off-site analysis in cases where it is not practical to attach a debugger while Daffodil is running.
________________________________
From: Sloane, Brandon <bs...@owlcyberdefense.com>
Sent: Thursday, January 7, 2021 1:21 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: The future of the daffodil DFDL schema debugger?

I've been thinking about a tool along similar lines (although more integrated with Daffodil than post-processing the trace output).

One thing to keep in mind is that, although the trace output is presented as a linear log (since we do not have much choice), the actual process is more of a tree, due to backtracking.

Ideally, we would have a multi-pane window showing:


  *   The hex/binary data
  *   The infoset
  *   A time-axis parse tree; with a "major" node at every point of uncertainty and parse error, and "minor" nodes at every parse step
  *   A view of the DFDL schema
  *   An interactive terminal debugger (e.g. what we currently have)
  *   Breakpoints/variables/delimeter-stack/etc

Within these panes, you ough to be able to select a given region/element, and highlight all the corresponding elements in the other panes.

I think that exporting the nessasary information from Daffodil to implement all of this would be relativly straightforward. The only potentially problametic parts I see are:

  *   The interactive debugger would require some form of time-travel to implement (I think most of the work for this is done to support backracking)
  *   The memory requirements when used on large infosets

________________________________
From: Larry Barber <la...@nteligen.com>
Sent: Thursday, January 7, 2021 1:08 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: RE: The future of the daffodil DFDL schema debugger?

When I was doing strange and unusual things with DFDL and generating a lot of errors, I envisioned how helpful it would be to have a tool that would post-process the --trace output and use it to display a dual pane window (like the editor referenced below) with the schema on one side and hex version on the other, with a slider that would allow be to flow through the parsing action and see pointers as to where the parser was in both the schema and input files. In other words just convert the information from the -trace into a more useful graphical display.
Perhaps breakpoint like markers could be added to both files to quickly scan through and display what sections of the schema read which locations in the file, or vice versa.

-----Original Message-----
From: Steve Lawrence [mailto:slawrence@apache.org]
Sent: Wednesday, January 6, 2021 1:42 PM
To: dev@daffodil.apache.org
Subject: Re: The future of the daffodil DFDL schema debugger?

Yep, something like that seems very reasonable for dealing with large infosets. But it still feels like we still run into usability issues.
For example, what if a user wants to see more? We need some configuration options to increase what we've ellided. It's not big, but every new thing that needs configuration adds complexity and decreases usability.

And I think the only reason we are trying to spend effort elliding things is because we're limited to this gdb-like interface where you can only print out a little information at a time.

I think what would really is to dump this gdb interface and instead use multiple windows/views. As a really close example to what I imagine, I recently came across this hex editor:

https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.synalysis.net%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637455553366581733%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=B8YS4yJYrqhZ%2BoINnNDa%2BVCe77ZNjyiAEjvhdRLA%2BZY%3D&amp;reserved=0

The screenshots are a bit small so it's not super clear, but this tool has one view for the data in hex, and one view for a tree of parsed results (which is very similar to our infoset). The "infoset" view has information like offset/length/value, and can be related back to the data view to find the actual bits.

I imagine the "next generation daffodil debugger" to look much like this. As data is parsed, the infoset view fills up. This view could act like a standard GUI tree so you could collapse sections or scroll around to show just the parts you care about, and have search capabilities to quickly jump around. The advantage here is you no longer really need automated eliding or heuristics for what the user *might* care about.
You just show the whole thing and let user scroll around. As daffodil parses and backtracks, this tree grows or shrinks.

I also imagine you could have a cursor moving around the hex view, so as daffodil moves around (e.g. scanning for delimiters, extracting integers), one could update this data view to show what daffodil is doing and where it is.

I also image there could be other views as well. For example, a schema view to show where in the schema daffodil is, and to add/remove breakpoints. And an information view for things like variables, in-scope delimiters, PoU's, etc.

The only reason I mention a debug protcol is that would allow this GUI to be more easily written in something other that Java/Scala to take advantage of other GUI toolkits. It's been a long while since I've done anything with Java guis, but they seems pretty poor that last I looked at them. Would even allow for a TUI, which Java has little/no support for. Also enables things like remote deubgging if an socket IPC was used. Though I'm not sure all of that is necessary. Just thinking what would be ideal, and it can always be pared back.


On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> I don't think of it as a daffodil debug protocol, but just a separation of concerns between display of information and the behaviors of parse/unparse that need to be points where users can pause, and data structures available to display.
>
> E.g., it is 100% a display issue that the infoset (shown as XML) is clumsy, too big, etc.  The infoset is available in the processor state, and one can examine the current node, enclosing node, prior sibling(s), following sibling(s), etc. One can elide contents that are too big for hexBinary, etc.
>
> I think this problem, how to display the infoset with sensible limits on sizing, is fairly easy to come up with some design for, that will at least be (1) always fairly small (2) much more useful in more cases. It won't be perfect but can be much better than what we do now.
>
> One sensible display "mode" should be that displaying the context
> surrounding the current element (when parsing or unparsing) displays
> at most N-lines. (N/2 before, N/2 after) with a maximum length of L
> characters (settable within reason ?)
>
> Sibling and enclosing nodes would be displayed eliding their contents to at most 1 line.
>
> Here's an example of what I mean. Displaying up to M=10 lines total:
>
> ...
> <enclosingParent1>
>    ...
>    <priorSibling2>89ab782 ...</...>
>    <priorSibling1>some text is here and some more text</...>
>    <currentNode>value might be some big thing which needs to be elided ...</...>
>    <followingSibling1> ... </...>
>    ???
> </enclosingParent1>
> ???
>
> The </...> is just an idea to reduce XML matching end-tag clutter.
>
> The ... on a line alone or where element content would appear generally means 1 or more other siblings. The way the display above starts with ... means that this is a relative inner nest, not starting from the absolute root.
>
> The ... within simple content means that content is elided to fit on one line. Always follows some text characters to differentiate from the child-element context.
>
> The ??? means zero or more other siblings.
>
> I used bold italic above to point out that the current node would be highlighted somehow. Probably a way to do this that doesn't require display modes would be useful. E.g., a text marker like ">>>" as in:
>
>>>> <currentNode>value .... </...>
>
> might be better, particularly for a trace output being dumped to a text file.
>
> I made the above example an unparser kind of example by showing a following sibling that exists that is after the current node.
>
> I think the key concept is that any sibling node is displayed in a way that fits on one line.
> E.g., even if the element name was really long, I'd suggest:
>
>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>
> Where the element name itself gets elided because it is too long.
>
> A thought. Note that the above presentation is shown as quasi-XML, but there's nothing XML-specific about it. A JSON-friendly equivalent could be done as well:
>
> enclosingParent1 = {
>    ...
>    priorSibling2 = "89ab782..."
>    priorSibling1 = "some text is here and some more text"
>    currentNode = "value might be some big thing which needs to be elided ..."
>    followingSibling1 = { ... }
>    ???
> }
>
> That's enough for 1 email thread on this debug topic.
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Tuesday, January 5, 2021 2:26 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: The future of the daffodil DFDL schema debugger?
>
>
> Now that we're in a new year, I'd like to start a discussion about the
> Daffodil DFDL Schema debugger and how it might be improved to be more
> useful.
>
> Note that this is not the capabilities to debug Daffodil itself in
> something like Eclipse/IntelliJ, but the ability for Daffodil to
> provide enough extra information during a parse/unparse so that a
> schema developer can get an idea of what Daffodil is doing. This makes
> it easier for users (rather than developers) to determine why a schema
> isn't giving the expect parse/unparse result (either because of bad
> data or a faulty schema.
>
> The current state of the debugger is enabled by providing the --debug
> or --trace flags in the CLI. More information about that here:
>
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fdaf
> fodil.apache.org%2Fdebugger%2F&amp;data=04%7C01%7Clarry.barber%40nteli
> gen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047
> 675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
> C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;s
> data=eD1ut5aOb%2B2DlFhHL%2FJ5qcd9vMAVTv3EXJ5HdlAMD%2FM%3D&amp;reserved
> =0
>
> This enables a TUI and commands somewhat similar to GDB, providing
> thins like breakpoints, steps, displaying the current infoset, display
> a dump of the data, etc.
>
> Although I find this tool pretty useful, it definitely has some
> glaring issues.
>
> The most glaring to me is that it really isn't useful at all for
> debugging unparse. The data dumps only include then main outputstream,
> so determine things like suspensions and buffered output is impossible.
>
> Another issue is the infoset output. When outputting the infoset, the
> debugger currently just walks the entire thing and converts it to XML
> and displays the XML. For large infosets, this is excess and can make
> it impossible to use, even with some configurations the limit how much
> of that infoset is actually printed to the screen. Also things like
> large hex binary blobs create excessive and unusable output.
>
> Another thing I feel is missing is a schema view. Right now it's very
> difficult to know where in the schema Daffodil actually is.
>
> I think these issues just need some thought improvement. One could
> imagine a better way to stringify our unparse buffers for debug. One
> could image a way to receive infoset state changes so the debugger can
> track things like backtracks and remove infosets. One could image a
> way display the schema
>
> We just need a better way to stringify the current state of the
> unparse data including buffers, and we need a way to for the debugger
> to receive state change information about infoset so it can update
> displays rather than just constantly printing the entire infoset.
>
> However, I think another other big issue is just usability in general.
> I think the CLI usage is reasonable, but it's not always user
> friendly, and is difficult to view multiple things at the same time. I
> think because of this very few people even use this tool. So this this
> like perhaps something worth focus.
>
> My first thought to improving this usability issue would be to
> implement the Debug Adapter Protocol (DAP)
> (https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fmi
> crosoft.github.io%2Fdebug-adapter-protocol%2F&amp;data=04%7C01%7Clarry
> .barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c
> 944e86a6062d047675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KLEXoeKVQWOlg6vg44NdWGU58CFSQkJDwSf94OnWbT0%3D&amp;reserved=0) for Daffodil, which many IDE's implement. With this implemented, Daffodil could be plugged in to any IDE that supports it and essentially get debugging for free, without the need to worry about the GUI elements.
>
> I do have concerns that this just wouldn't have enough functionality
> that we'd really need. For example, DAP really only has ability show
> code (Daffodil's equivalent is the DFDL schema). There isn't a way to
> show a live view of the infoset or data. Most DAP IDE's do have a
> console output, so we could potentially make it so the console output
> is a live view of infoset/data. But I'm not even sure most DAP
> friendly IDE's could support this kindof console output. Does anyone
> have familiarity with DAP IDE's or and what kinds of console
> capabilities are available?
>
> I also looked into TUI libraries with the idea that we could just
> extend our current debugger user interface to be a bit friendlier.
> Unfortunately, there aren't too many Java/Scala TUI libraries and
> those that do exist don't have Apache friendly licenses. We also want
> to be careful about increase dependencies just for a debugger than
> many people might not use, so large graphics libraries are probably out of the question.
>
> This allo makes me wonder if an approach worth taking for the future
> of Daffodil schema debugging is developing a sort of "Daffodil Debug
> Protocol". I imagine it would be loosely based on DAP (which is
> essentially JSON message based) but could be targeted to the things
> that a DFDL schema debugger would really need. An added benefit with
> some sort of protocol is the debugger interface can be uncoupled from
> Daffodil itself, so we could implement a TUI/GUI/whatever in any
> language/GUI framework and just have it communicate the protocol over
> some form of IPC. Another benefit is that any future backends could
> implement this protocol and so a single debugger could hook into
> different backends without much issue. Unfortunately, defining such a
> protocol might be a large task, but we do have our existing debug
> infrastructure and things like DAP to guide its development/design.
>
> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> we really just need the few improvements mentioned to the existing
> debugger. Is that enough to make it usable? Or is an entirely
> different approach needed to debugging schemas?
>


Re: The future of the daffodil DFDL schema debugger?

Posted by "Sloane, Brandon" <bs...@owlcyberdefense.com>.
I've been thinking about a tool along similar lines (although more integrated with Daffodil than post-processing the trace output).

One thing to keep in mind is that, although the trace output is presented as a linear log (since we do not have much choice), the actual process is more of a tree, due to backtracking.

Ideally, we would have a multi-pane window showing:


  *   The hex/binary data
  *   The infoset
  *   A time-axis parse tree; with a "major" node at every point of uncertainty and parse error, and "minor" nodes at every parse step
  *   A view of the DFDL schema
  *   An interactive terminal debugger (e.g. what we currently have)
  *   Breakpoints/variables/delimeter-stack/etc

Within these panes, you ough to be able to select a given region/element, and highlight all the corresponding elements in the other panes.

I think that exporting the nessasary information from Daffodil to implement all of this would be relativly straightforward. The only potentially problametic parts I see are:

  *   The interactive debugger would require some form of time-travel to implement (I think most of the work for this is done to support backracking)
  *   The memory requirements when used on large infosets

________________________________
From: Larry Barber <la...@nteligen.com>
Sent: Thursday, January 7, 2021 1:08 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: RE: The future of the daffodil DFDL schema debugger?

When I was doing strange and unusual things with DFDL and generating a lot of errors, I envisioned how helpful it would be to have a tool that would post-process the --trace output and use it to display a dual pane window (like the editor referenced below) with the schema on one side and hex version on the other, with a slider that would allow be to flow through the parsing action and see pointers as to where the parser was in both the schema and input files. In other words just convert the information from the -trace into a more useful graphical display.
Perhaps breakpoint like markers could be added to both files to quickly scan through and display what sections of the schema read which locations in the file, or vice versa.

-----Original Message-----
From: Steve Lawrence [mailto:slawrence@apache.org]
Sent: Wednesday, January 6, 2021 1:42 PM
To: dev@daffodil.apache.org
Subject: Re: The future of the daffodil DFDL schema debugger?

Yep, something like that seems very reasonable for dealing with large infosets. But it still feels like we still run into usability issues.
For example, what if a user wants to see more? We need some configuration options to increase what we've ellided. It's not big, but every new thing that needs configuration adds complexity and decreases usability.

And I think the only reason we are trying to spend effort elliding things is because we're limited to this gdb-like interface where you can only print out a little information at a time.

I think what would really is to dump this gdb interface and instead use multiple windows/views. As a really close example to what I imagine, I recently came across this hex editor:

https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.synalysis.net%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637455553366581733%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=B8YS4yJYrqhZ%2BoINnNDa%2BVCe77ZNjyiAEjvhdRLA%2BZY%3D&amp;reserved=0

The screenshots are a bit small so it's not super clear, but this tool has one view for the data in hex, and one view for a tree of parsed results (which is very similar to our infoset). The "infoset" view has information like offset/length/value, and can be related back to the data view to find the actual bits.

I imagine the "next generation daffodil debugger" to look much like this. As data is parsed, the infoset view fills up. This view could act like a standard GUI tree so you could collapse sections or scroll around to show just the parts you care about, and have search capabilities to quickly jump around. The advantage here is you no longer really need automated eliding or heuristics for what the user *might* care about.
You just show the whole thing and let user scroll around. As daffodil parses and backtracks, this tree grows or shrinks.

I also imagine you could have a cursor moving around the hex view, so as daffodil moves around (e.g. scanning for delimiters, extracting integers), one could update this data view to show what daffodil is doing and where it is.

I also image there could be other views as well. For example, a schema view to show where in the schema daffodil is, and to add/remove breakpoints. And an information view for things like variables, in-scope delimiters, PoU's, etc.

The only reason I mention a debug protcol is that would allow this GUI to be more easily written in something other that Java/Scala to take advantage of other GUI toolkits. It's been a long while since I've done anything with Java guis, but they seems pretty poor that last I looked at them. Would even allow for a TUI, which Java has little/no support for. Also enables things like remote deubgging if an socket IPC was used. Though I'm not sure all of that is necessary. Just thinking what would be ideal, and it can always be pared back.


On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> I don't think of it as a daffodil debug protocol, but just a separation of concerns between display of information and the behaviors of parse/unparse that need to be points where users can pause, and data structures available to display.
>
> E.g., it is 100% a display issue that the infoset (shown as XML) is clumsy, too big, etc.  The infoset is available in the processor state, and one can examine the current node, enclosing node, prior sibling(s), following sibling(s), etc. One can elide contents that are too big for hexBinary, etc.
>
> I think this problem, how to display the infoset with sensible limits on sizing, is fairly easy to come up with some design for, that will at least be (1) always fairly small (2) much more useful in more cases. It won't be perfect but can be much better than what we do now.
>
> One sensible display "mode" should be that displaying the context
> surrounding the current element (when parsing or unparsing) displays
> at most N-lines. (N/2 before, N/2 after) with a maximum length of L
> characters (settable within reason ?)
>
> Sibling and enclosing nodes would be displayed eliding their contents to at most 1 line.
>
> Here's an example of what I mean. Displaying up to M=10 lines total:
>
> ...
> <enclosingParent1>
>    ...
>    <priorSibling2>89ab782 ...</...>
>    <priorSibling1>some text is here and some more text</...>
>    <currentNode>value might be some big thing which needs to be elided ...</...>
>    <followingSibling1> ... </...>
>    ???
> </enclosingParent1>
> ???
>
> The </...> is just an idea to reduce XML matching end-tag clutter.
>
> The ... on a line alone or where element content would appear generally means 1 or more other siblings. The way the display above starts with ... means that this is a relative inner nest, not starting from the absolute root.
>
> The ... within simple content means that content is elided to fit on one line. Always follows some text characters to differentiate from the child-element context.
>
> The ??? means zero or more other siblings.
>
> I used bold italic above to point out that the current node would be highlighted somehow. Probably a way to do this that doesn't require display modes would be useful. E.g., a text marker like ">>>" as in:
>
>>>> <currentNode>value .... </...>
>
> might be better, particularly for a trace output being dumped to a text file.
>
> I made the above example an unparser kind of example by showing a following sibling that exists that is after the current node.
>
> I think the key concept is that any sibling node is displayed in a way that fits on one line.
> E.g., even if the element name was really long, I'd suggest:
>
>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>
> Where the element name itself gets elided because it is too long.
>
> A thought. Note that the above presentation is shown as quasi-XML, but there's nothing XML-specific about it. A JSON-friendly equivalent could be done as well:
>
> enclosingParent1 = {
>    ...
>    priorSibling2 = "89ab782..."
>    priorSibling1 = "some text is here and some more text"
>    currentNode = "value might be some big thing which needs to be elided ..."
>    followingSibling1 = { ... }
>    ???
> }
>
> That's enough for 1 email thread on this debug topic.
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Tuesday, January 5, 2021 2:26 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: The future of the daffodil DFDL schema debugger?
>
>
> Now that we're in a new year, I'd like to start a discussion about the
> Daffodil DFDL Schema debugger and how it might be improved to be more
> useful.
>
> Note that this is not the capabilities to debug Daffodil itself in
> something like Eclipse/IntelliJ, but the ability for Daffodil to
> provide enough extra information during a parse/unparse so that a
> schema developer can get an idea of what Daffodil is doing. This makes
> it easier for users (rather than developers) to determine why a schema
> isn't giving the expect parse/unparse result (either because of bad
> data or a faulty schema.
>
> The current state of the debugger is enabled by providing the --debug
> or --trace flags in the CLI. More information about that here:
>
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fdaf
> fodil.apache.org%2Fdebugger%2F&amp;data=04%7C01%7Clarry.barber%40nteli
> gen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047
> 675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
> C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;s
> data=eD1ut5aOb%2B2DlFhHL%2FJ5qcd9vMAVTv3EXJ5HdlAMD%2FM%3D&amp;reserved
> =0
>
> This enables a TUI and commands somewhat similar to GDB, providing
> thins like breakpoints, steps, displaying the current infoset, display
> a dump of the data, etc.
>
> Although I find this tool pretty useful, it definitely has some
> glaring issues.
>
> The most glaring to me is that it really isn't useful at all for
> debugging unparse. The data dumps only include then main outputstream,
> so determine things like suspensions and buffered output is impossible.
>
> Another issue is the infoset output. When outputting the infoset, the
> debugger currently just walks the entire thing and converts it to XML
> and displays the XML. For large infosets, this is excess and can make
> it impossible to use, even with some configurations the limit how much
> of that infoset is actually printed to the screen. Also things like
> large hex binary blobs create excessive and unusable output.
>
> Another thing I feel is missing is a schema view. Right now it's very
> difficult to know where in the schema Daffodil actually is.
>
> I think these issues just need some thought improvement. One could
> imagine a better way to stringify our unparse buffers for debug. One
> could image a way to receive infoset state changes so the debugger can
> track things like backtracks and remove infosets. One could image a
> way display the schema
>
> We just need a better way to stringify the current state of the
> unparse data including buffers, and we need a way to for the debugger
> to receive state change information about infoset so it can update
> displays rather than just constantly printing the entire infoset.
>
> However, I think another other big issue is just usability in general.
> I think the CLI usage is reasonable, but it's not always user
> friendly, and is difficult to view multiple things at the same time. I
> think because of this very few people even use this tool. So this this
> like perhaps something worth focus.
>
> My first thought to improving this usability issue would be to
> implement the Debug Adapter Protocol (DAP)
> (https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fmi
> crosoft.github.io%2Fdebug-adapter-protocol%2F&amp;data=04%7C01%7Clarry
> .barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c
> 944e86a6062d047675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KLEXoeKVQWOlg6vg44NdWGU58CFSQkJDwSf94OnWbT0%3D&amp;reserved=0) for Daffodil, which many IDE's implement. With this implemented, Daffodil could be plugged in to any IDE that supports it and essentially get debugging for free, without the need to worry about the GUI elements.
>
> I do have concerns that this just wouldn't have enough functionality
> that we'd really need. For example, DAP really only has ability show
> code (Daffodil's equivalent is the DFDL schema). There isn't a way to
> show a live view of the infoset or data. Most DAP IDE's do have a
> console output, so we could potentially make it so the console output
> is a live view of infoset/data. But I'm not even sure most DAP
> friendly IDE's could support this kindof console output. Does anyone
> have familiarity with DAP IDE's or and what kinds of console
> capabilities are available?
>
> I also looked into TUI libraries with the idea that we could just
> extend our current debugger user interface to be a bit friendlier.
> Unfortunately, there aren't too many Java/Scala TUI libraries and
> those that do exist don't have Apache friendly licenses. We also want
> to be careful about increase dependencies just for a debugger than
> many people might not use, so large graphics libraries are probably out of the question.
>
> This allo makes me wonder if an approach worth taking for the future
> of Daffodil schema debugging is developing a sort of "Daffodil Debug
> Protocol". I imagine it would be loosely based on DAP (which is
> essentially JSON message based) but could be targeted to the things
> that a DFDL schema debugger would really need. An added benefit with
> some sort of protocol is the debugger interface can be uncoupled from
> Daffodil itself, so we could implement a TUI/GUI/whatever in any
> language/GUI framework and just have it communicate the protocol over
> some form of IPC. Another benefit is that any future backends could
> implement this protocol and so a single debugger could hook into
> different backends without much issue. Unfortunately, defining such a
> protocol might be a large task, but we do have our existing debug
> infrastructure and things like DAP to guide its development/design.
>
> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> we really just need the few improvements mentioned to the existing
> debugger. Is that enough to make it usable? Or is an entirely
> different approach needed to debugging schemas?
>


RE: The future of the daffodil DFDL schema debugger?

Posted by Larry Barber <la...@nteligen.com>.
When I was doing strange and unusual things with DFDL and generating a lot of errors, I envisioned how helpful it would be to have a tool that would post-process the --trace output and use it to display a dual pane window (like the editor referenced below) with the schema on one side and hex version on the other, with a slider that would allow be to flow through the parsing action and see pointers as to where the parser was in both the schema and input files. In other words just convert the information from the -trace into a more useful graphical display.
Perhaps breakpoint like markers could be added to both files to quickly scan through and display what sections of the schema read which locations in the file, or vice versa.

-----Original Message-----
From: Steve Lawrence [mailto:slawrence@apache.org] 
Sent: Wednesday, January 6, 2021 1:42 PM
To: dev@daffodil.apache.org
Subject: Re: The future of the daffodil DFDL schema debugger?

Yep, something like that seems very reasonable for dealing with large infosets. But it still feels like we still run into usability issues.
For example, what if a user wants to see more? We need some configuration options to increase what we've ellided. It's not big, but every new thing that needs configuration adds complexity and decreases usability.

And I think the only reason we are trying to spend effort elliding things is because we're limited to this gdb-like interface where you can only print out a little information at a time.

I think what would really is to dump this gdb interface and instead use multiple windows/views. As a really close example to what I imagine, I recently came across this hex editor:

https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fwww.synalysis.net%2F&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637455553366581733%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=B8YS4yJYrqhZ%2BoINnNDa%2BVCe77ZNjyiAEjvhdRLA%2BZY%3D&amp;reserved=0

The screenshots are a bit small so it's not super clear, but this tool has one view for the data in hex, and one view for a tree of parsed results (which is very similar to our infoset). The "infoset" view has information like offset/length/value, and can be related back to the data view to find the actual bits.

I imagine the "next generation daffodil debugger" to look much like this. As data is parsed, the infoset view fills up. This view could act like a standard GUI tree so you could collapse sections or scroll around to show just the parts you care about, and have search capabilities to quickly jump around. The advantage here is you no longer really need automated eliding or heuristics for what the user *might* care about.
You just show the whole thing and let user scroll around. As daffodil parses and backtracks, this tree grows or shrinks.

I also imagine you could have a cursor moving around the hex view, so as daffodil moves around (e.g. scanning for delimiters, extracting integers), one could update this data view to show what daffodil is doing and where it is.

I also image there could be other views as well. For example, a schema view to show where in the schema daffodil is, and to add/remove breakpoints. And an information view for things like variables, in-scope delimiters, PoU's, etc.

The only reason I mention a debug protcol is that would allow this GUI to be more easily written in something other that Java/Scala to take advantage of other GUI toolkits. It's been a long while since I've done anything with Java guis, but they seems pretty poor that last I looked at them. Would even allow for a TUI, which Java has little/no support for. Also enables things like remote deubgging if an socket IPC was used. Though I'm not sure all of that is necessary. Just thinking what would be ideal, and it can always be pared back.


On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> I don't think of it as a daffodil debug protocol, but just a separation of concerns between display of information and the behaviors of parse/unparse that need to be points where users can pause, and data structures available to display.
> 
> E.g., it is 100% a display issue that the infoset (shown as XML) is clumsy, too big, etc.  The infoset is available in the processor state, and one can examine the current node, enclosing node, prior sibling(s), following sibling(s), etc. One can elide contents that are too big for hexBinary, etc.
> 
> I think this problem, how to display the infoset with sensible limits on sizing, is fairly easy to come up with some design for, that will at least be (1) always fairly small (2) much more useful in more cases. It won't be perfect but can be much better than what we do now.
> 
> One sensible display "mode" should be that displaying the context 
> surrounding the current element (when parsing or unparsing) displays 
> at most N-lines. (N/2 before, N/2 after) with a maximum length of L 
> characters (settable within reason ?)
> 
> Sibling and enclosing nodes would be displayed eliding their contents to at most 1 line.
> 
> Here's an example of what I mean. Displaying up to M=10 lines total:
> 
> ...
> <enclosingParent1>
>    ...
>    <priorSibling2>89ab782 ...</...>
>    <priorSibling1>some text is here and some more text</...>
>    <currentNode>value might be some big thing which needs to be elided ...</...>
>    <followingSibling1> ... </...>
>    ???
> </enclosingParent1>
> ???
> 
> The </...> is just an idea to reduce XML matching end-tag clutter.
> 
> The ... on a line alone or where element content would appear generally means 1 or more other siblings. The way the display above starts with ... means that this is a relative inner nest, not starting from the absolute root.
> 
> The ... within simple content means that content is elided to fit on one line. Always follows some text characters to differentiate from the child-element context.
> 
> The ??? means zero or more other siblings.
> 
> I used bold italic above to point out that the current node would be highlighted somehow. Probably a way to do this that doesn't require display modes would be useful. E.g., a text marker like ">>>" as in:
> 
>>>> <currentNode>value .... </...>
> 
> might be better, particularly for a trace output being dumped to a text file.
> 
> I made the above example an unparser kind of example by showing a following sibling that exists that is after the current node.
> 
> I think the key concept is that any sibling node is displayed in a way that fits on one line.
> E.g., even if the element name was really long, I'd suggest:
> 
>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> 
> Where the element name itself gets elided because it is too long.
> 
> A thought. Note that the above presentation is shown as quasi-XML, but there's nothing XML-specific about it. A JSON-friendly equivalent could be done as well:
> 
> enclosingParent1 = {
>    ...
>    priorSibling2 = "89ab782..."
>    priorSibling1 = "some text is here and some more text"
>    currentNode = "value might be some big thing which needs to be elided ..."
>    followingSibling1 = { ... }
>    ???
> }
> 
> That's enough for 1 email thread on this debug topic.
> 
> 
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Tuesday, January 5, 2021 2:26 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: The future of the daffodil DFDL schema debugger?
> 
> 
> Now that we're in a new year, I'd like to start a discussion about the 
> Daffodil DFDL Schema debugger and how it might be improved to be more 
> useful.
> 
> Note that this is not the capabilities to debug Daffodil itself in 
> something like Eclipse/IntelliJ, but the ability for Daffodil to 
> provide enough extra information during a parse/unparse so that a 
> schema developer can get an idea of what Daffodil is doing. This makes 
> it easier for users (rather than developers) to determine why a schema 
> isn't giving the expect parse/unparse result (either because of bad 
> data or a faulty schema.
> 
> The current state of the debugger is enabled by providing the --debug 
> or --trace flags in the CLI. More information about that here:
> 
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fdaf
> fodil.apache.org%2Fdebugger%2F&amp;data=04%7C01%7Clarry.barber%40nteli
> gen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c944e86a6062d047
> 675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
> C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;s
> data=eD1ut5aOb%2B2DlFhHL%2FJ5qcd9vMAVTv3EXJ5HdlAMD%2FM%3D&amp;reserved
> =0
> 
> This enables a TUI and commands somewhat similar to GDB, providing 
> thins like breakpoints, steps, displaying the current infoset, display 
> a dump of the data, etc.
> 
> Although I find this tool pretty useful, it definitely has some 
> glaring issues.
> 
> The most glaring to me is that it really isn't useful at all for 
> debugging unparse. The data dumps only include then main outputstream, 
> so determine things like suspensions and buffered output is impossible.
> 
> Another issue is the infoset output. When outputting the infoset, the 
> debugger currently just walks the entire thing and converts it to XML 
> and displays the XML. For large infosets, this is excess and can make 
> it impossible to use, even with some configurations the limit how much 
> of that infoset is actually printed to the screen. Also things like 
> large hex binary blobs create excessive and unusable output.
> 
> Another thing I feel is missing is a schema view. Right now it's very 
> difficult to know where in the schema Daffodil actually is.
> 
> I think these issues just need some thought improvement. One could 
> imagine a better way to stringify our unparse buffers for debug. One 
> could image a way to receive infoset state changes so the debugger can 
> track things like backtracks and remove infosets. One could image a 
> way display the schema
> 
> We just need a better way to stringify the current state of the 
> unparse data including buffers, and we need a way to for the debugger 
> to receive state change information about infoset so it can update 
> displays rather than just constantly printing the entire infoset.
> 
> However, I think another other big issue is just usability in general. 
> I think the CLI usage is reasonable, but it's not always user 
> friendly, and is difficult to view multiple things at the same time. I 
> think because of this very few people even use this tool. So this this 
> like perhaps something worth focus.
> 
> My first thought to improving this usability issue would be to 
> implement the Debug Adapter Protocol (DAP)
> (https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fmi
> crosoft.github.io%2Fdebug-adapter-protocol%2F&amp;data=04%7C01%7Clarry
> .barber%40nteligen.com%7C634abf420284401f456808d8b272c812%7C379c214c5c
> 944e86a6062d047675f02a%7C0%7C0%7C637455553366591730%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=KLEXoeKVQWOlg6vg44NdWGU58CFSQkJDwSf94OnWbT0%3D&amp;reserved=0) for Daffodil, which many IDE's implement. With this implemented, Daffodil could be plugged in to any IDE that supports it and essentially get debugging for free, without the need to worry about the GUI elements.
> 
> I do have concerns that this just wouldn't have enough functionality 
> that we'd really need. For example, DAP really only has ability show 
> code (Daffodil's equivalent is the DFDL schema). There isn't a way to 
> show a live view of the infoset or data. Most DAP IDE's do have a 
> console output, so we could potentially make it so the console output 
> is a live view of infoset/data. But I'm not even sure most DAP 
> friendly IDE's could support this kindof console output. Does anyone 
> have familiarity with DAP IDE's or and what kinds of console 
> capabilities are available?
> 
> I also looked into TUI libraries with the idea that we could just 
> extend our current debugger user interface to be a bit friendlier.
> Unfortunately, there aren't too many Java/Scala TUI libraries and 
> those that do exist don't have Apache friendly licenses. We also want 
> to be careful about increase dependencies just for a debugger than 
> many people might not use, so large graphics libraries are probably out of the question.
> 
> This allo makes me wonder if an approach worth taking for the future 
> of Daffodil schema debugging is developing a sort of "Daffodil Debug 
> Protocol". I imagine it would be loosely based on DAP (which is 
> essentially JSON message based) but could be targeted to the things 
> that a DFDL schema debugger would really need. An added benefit with 
> some sort of protocol is the debugger interface can be uncoupled from 
> Daffodil itself, so we could implement a TUI/GUI/whatever in any 
> language/GUI framework and just have it communicate the protocol over 
> some form of IPC. Another benefit is that any future backends could 
> implement this protocol and so a single debugger could hook into 
> different backends without much issue. Unfortunately, defining such a 
> protocol might be a large task, but we do have our existing debug 
> infrastructure and things like DAP to guide its development/design.
> 
> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps 
> we really just need the few improvements mentioned to the existing 
> debugger. Is that enough to make it usable? Or is an entirely 
> different approach needed to debugging schemas?
> 


Re: The future of the daffodil DFDL schema debugger?

Posted by "Sloane, Brandon" <bs...@owlcyberdefense.com>.
I agree with Steve that "ideal" debugger would involve a rich multi-pane GUI; and like the idea of establishing a well defined protocol to isolate the GUI from the main codebase. Using a protocol would also give us scritability for nearly free, as users could leverage normal shell tools to script whatever debug automation/conveniences they need.

I'm not sure how well existing debugger protocols would work though (although if they do fit, it would save us a lot of effort). The type of debugging needed for Daffodil schema strikes me as fairly distinct from what you would typically expect from most debuggers.

On the subject of functionality, one feature that I would really like to see added is time travel. With the work we already do to support backtracking, it should be relatively simple to add support for fully restoring the parse state to a prior saved state; which would be a massive QoL improvement for the interactive debugger.

For the non-interactive tracer (and, to some extent the interactive debugger), I think we may need to support varying levels of verbosity. In addition to a global verbosity level, we should also have some way to flag specific "things" to get more or less details. Speciying exactly what a "thing" is is its own discussion, as even a simple type in the schema can end up having many different regions (prefix, suffix, padding, etc).

At a high level, I think I see 2 ways forward:

1) Mike's suggestion: make incremental improvements to our existing tooling, focusing primarily on reducing the volume information the user is exposed to.

2) Steve's idea: establish a debugging protocol and develop an external debugger.

I would add to 2 that we can develop an experimental debugger to play around with different design ideas much easier than we could if the debugger were itself part of Daffodil proper. Since I don't think we have a solid idea of what this debugger looks like, I think this is valuable.

Additionally, even if we use our own non-standard protocol, implementing (2) would still make it far easier for someone to integrate Daffodil debugging facilities into third party applications.

In my mind, this is entirely a question of engineering effort. If we are trying to improve the debugger, (1) is a must have at least in the sense of improving the output of --trace, as that non-interactive interface is simple enough to be quickly usable in almost any configuration. Having said that, if we are going to do (2), we should do it first, as it would probably simplify the work needed for (1)

If we have the resources, (2) would result in a far superior product.

Additionally, I think the work needed for (2) could have benifits beyond simply debugging. I have wanted for a while a tool similar to Wireshark's dissectors: where we could provide a schema then see the binary data and infoset side-by-side and see how regions of the two map to each other.
________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Wednesday, January 6, 2021 1:42 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: The future of the daffodil DFDL schema debugger?

Yep, something like that seems very reasonable for dealing with large
infosets. But it still feels like we still run into usability issues.
For example, what if a user wants to see more? We need some
configuration options to increase what we've ellided. It's not big, but
every new thing that needs configuration adds complexity and decreases
usability.

And I think the only reason we are trying to spend effort elliding
things is because we're limited to this gdb-like interface where you can
only print out a little information at a time.

I think what would really is to dump this gdb interface and instead use
multiple windows/views. As a really close example to what I imagine, I
recently came across this hex editor:

https://www.synalysis.net/

The screenshots are a bit small so it's not super clear, but this tool
has one view for the data in hex, and one view for a tree of parsed
results (which is very similar to our infoset). The "infoset" view has
information like offset/length/value, and can be related back to the
data view to find the actual bits.

I imagine the "next generation daffodil debugger" to look much like
this. As data is parsed, the infoset view fills up. This view could act
like a standard GUI tree so you could collapse sections or scroll around
to show just the parts you care about, and have search capabilities to
quickly jump around. The advantage here is you no longer really need
automated eliding or heuristics for what the user *might* care about.
You just show the whole thing and let user scroll around. As daffodil
parses and backtracks, this tree grows or shrinks.

I also imagine you could have a cursor moving around the hex view, so as
daffodil moves around (e.g. scanning for delimiters, extracting
integers), one could update this data view to show what daffodil is
doing and where it is.

I also image there could be other views as well. For example, a schema
view to show where in the schema daffodil is, and to add/remove
breakpoints. And an information view for things like variables, in-scope
delimiters, PoU's, etc.

The only reason I mention a debug protcol is that would allow this GUI
to be more easily written in something other that Java/Scala to take
advantage of other GUI toolkits. It's been a long while since I've done
anything with Java guis, but they seems pretty poor that last I looked
at them. Would even allow for a TUI, which Java has little/no support
for. Also enables things like remote deubgging if an socket IPC was
used. Though I'm not sure all of that is necessary. Just thinking what
would be ideal, and it can always be pared back.


On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> I don't think of it as a daffodil debug protocol, but just a separation of concerns between display of information and the behaviors of parse/unparse that need to be points where users can pause, and data structures available to display.
>
> E.g., it is 100% a display issue that the infoset (shown as XML) is clumsy, too big, etc.  The infoset is available in the processor state, and one can examine the current node, enclosing node, prior sibling(s), following sibling(s), etc. One can elide contents that are too big for hexBinary, etc.
>
> I think this problem, how to display the infoset with sensible limits on sizing, is fairly easy to come up with some design for, that will at least be (1) always fairly small (2) much more useful in more cases. It won't be perfect but can be much better than what we do now.
>
> One sensible display "mode" should be that displaying the context surrounding the current element (when parsing or unparsing) displays at most N-lines. (N/2 before, N/2 after) with a maximum length of L characters (settable within reason ?)
>
> Sibling and enclosing nodes would be displayed eliding their contents to at most 1 line.
>
> Here's an example of what I mean. Displaying up to M=10 lines total:
>
> ...
> <enclosingParent1>
>    ...
>    <priorSibling2>89ab782 ...</...>
>    <priorSibling1>some text is here and some more text</...>
>    <currentNode>value might be some big thing which needs to be elided ...</...>
>    <followingSibling1> ... </...>
>    ???
> </enclosingParent1>
> ???
>
> The </...> is just an idea to reduce XML matching end-tag clutter.
>
> The ... on a line alone or where element content would appear generally means 1 or more other siblings. The way the display above starts with ... means that this is a relative inner nest, not starting from the absolute root.
>
> The ... within simple content means that content is elided to fit on one line. Always follows some text characters to differentiate from the child-element context.
>
> The ??? means zero or more other siblings.
>
> I used bold italic above to point out that the current node would be highlighted somehow. Probably a way to do this that doesn't require display modes would be useful. E.g., a text marker like ">>>" as in:
>
>>>> <currentNode>value .... </...>
>
> might be better, particularly for a trace output being dumped to a text file.
>
> I made the above example an unparser kind of example by showing a following sibling that exists that is after the current node.
>
> I think the key concept is that any sibling node is displayed in a way that fits on one line.
> E.g., even if the element name was really long, I'd suggest:
>
>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>
> Where the element name itself gets elided because it is too long.
>
> A thought. Note that the above presentation is shown as quasi-XML, but there's nothing XML-specific about it. A JSON-friendly equivalent could be done as well:
>
> enclosingParent1 = {
>    ...
>    priorSibling2 = "89ab782..."
>    priorSibling1 = "some text is here and some more text"
>    currentNode = "value might be some big thing which needs to be elided ..."
>    followingSibling1 = { ... }
>    ???
> }
>
> That's enough for 1 email thread on this debug topic.
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Tuesday, January 5, 2021 2:26 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: The future of the daffodil DFDL schema debugger?
>
>
> Now that we're in a new year, I'd like to start a discussion about the
> Daffodil DFDL Schema debugger and how it might be improved to be more
> useful.
>
> Note that this is not the capabilities to debug Daffodil itself in
> something like Eclipse/IntelliJ, but the ability for Daffodil to provide
> enough extra information during a parse/unparse so that a schema
> developer can get an idea of what Daffodil is doing. This makes it
> easier for users (rather than developers) to determine why a schema
> isn't giving the expect parse/unparse result (either because of bad data
> or a faulty schema.
>
> The current state of the debugger is enabled by providing the --debug or
> --trace flags in the CLI. More information about that here:
>
> https://daffodil.apache.org/debugger/
>
> This enables a TUI and commands somewhat similar to GDB, providing thins
> like breakpoints, steps, displaying the current infoset, display a dump
> of the data, etc.
>
> Although I find this tool pretty useful, it definitely has some glaring
> issues.
>
> The most glaring to me is that it really isn't useful at all for
> debugging unparse. The data dumps only include then main outputstream,
> so determine things like suspensions and buffered output is impossible.
>
> Another issue is the infoset output. When outputting the infoset, the
> debugger currently just walks the entire thing and converts it to XML
> and displays the XML. For large infosets, this is excess and can make it
> impossible to use, even with some configurations the limit how much of
> that infoset is actually printed to the screen. Also things like large
> hex binary blobs create excessive and unusable output.
>
> Another thing I feel is missing is a schema view. Right now it's very
> difficult to know where in the schema Daffodil actually is.
>
> I think these issues just need some thought improvement. One could
> imagine a better way to stringify our unparse buffers for debug. One
> could image a way to receive infoset state changes so the debugger can
> track things like backtracks and remove infosets. One could image a way
> display the schema
>
> We just need a better way to stringify the current state of the unparse
> data including buffers, and we need a way to for the debugger to receive
> state change information about infoset so it can update displays rather
> than just constantly printing the entire infoset.
>
> However, I think another other big issue is just usability in general. I
> think the CLI usage is reasonable, but it's not always user friendly,
> and is difficult to view multiple things at the same time. I think
> because of this very few people even use this tool. So this this like
> perhaps something worth focus.
>
> My first thought to improving this usability issue would be to implement
> the Debug Adapter Protocol (DAP)
> (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
> which many IDE's implement. With this implemented, Daffodil could be
> plugged in to any IDE that supports it and essentially get debugging for
> free, without the need to worry about the GUI elements.
>
> I do have concerns that this just wouldn't have enough functionality
> that we'd really need. For example, DAP really only has ability show
> code (Daffodil's equivalent is the DFDL schema). There isn't a way to
> show a live view of the infoset or data. Most DAP IDE's do have a
> console output, so we could potentially make it so the console output is
> a live view of infoset/data. But I'm not even sure most DAP friendly
> IDE's could support this kindof console output. Does anyone have
> familiarity with DAP IDE's or and what kinds of console capabilities are
> available?
>
> I also looked into TUI libraries with the idea that we could just extend
> our current debugger user interface to be a bit friendlier.
> Unfortunately, there aren't too many Java/Scala TUI libraries and those
> that do exist don't have Apache friendly licenses. We also want to be
> careful about increase dependencies just for a debugger than many people
> might not use, so large graphics libraries are probably out of the question.
>
> This allo makes me wonder if an approach worth taking for the future of
> Daffodil schema debugging is developing a sort of "Daffodil Debug
> Protocol". I imagine it would be loosely based on DAP (which is
> essentially JSON message based) but could be targeted to the things that
> a DFDL schema debugger would really need. An added benefit with some
> sort of protocol is the debugger interface can be uncoupled from
> Daffodil itself, so we could implement a TUI/GUI/whatever in any
> language/GUI framework and just have it communicate the protocol over
> some form of IPC. Another benefit is that any future backends could
> implement this protocol and so a single debugger could hook into
> different backends without much issue. Unfortunately, defining such a
> protocol might be a large task, but we do have our existing debug
> infrastructure and things like DAP to guide its development/design.
>
> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps we
> really just need the few improvements mentioned to the existing
> debugger. Is that enough to make it usable? Or is an entirely different
> approach needed to debugging schemas?
>


Re: The future of the daffodil DFDL schema debugger?

Posted by John Wass <jw...@gmail.com>.
> lives in daffodil repo (new subproject?)

Not asking a question here, meant to snip out those parens.

The daffodil-debug-api and any daffodil-debug-io-NAME projects do represent
new subprojects.

Just wanted to clarify, never see those things till send is hit.



On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:

> Revisiting this post after doing some debugger related work and thinking
> about debug protocol/adapters to connect external tooling to the debug
> process.
>
> This comment is good
>
> > This allo makes me wonder if an approach worth taking for the future of
> Daffodil schema debugging is developing a sort of "Daffodil Debug Protocol".
> I imagine it would be loosely based on DAP (which is  essentially JSON
> message based) but could be targeted to the things that a DFDL schema
> debugger would really need. An added benefit with some  sort of protocol
> is the debugger interface can be uncoupled from Daffodil itself, so we
> could implement a TUI/GUI/whatever in any  language/GUI framework and just
> have it communicate the protocol over some form of IPC. Another benefit
> is that any future backends could implement this protocol and so a single
> debugger could hook into different backends without much issue.
> Unfortunately, defining such a protocol might be a large task, but we do
> have our existing debug infrastructure and things like DAP to guide its
> development/design.
>
> Some thoughts on this
> - Defining the protocol will be a large task, but a minimal version should
> get up and round tripping quickly with a minimal subset of the protocol.
> - The new protocol being informed by existing debugger and DAPis key
> - Uncoupling from Daffodil is key
> - Adapt the Daffodil protocol to produce DAP after the fact so as not to
> constrain Daffodil debugging capability
> - We dont need to tie the protocol or adapters to a single framework,
> implementations of the IO layer should be simple enough to support multiple
> things (eg Akka, Zio, "basic" ...)
> - The current debugger lives in runtime1, but can we make an abstract API
> that any runtime would implement?
>
> Maybe a solution is structured like this
> - daffodil-debug-api:
>   - protocol model
>   - interfaces: debugger / IO adapter / etc
>   - lives in daffodil repo (new subproject?)
> - daffodil-debug-io-NAME
>   - provides implementation of a specific IO adapter
>   - multiple projects possible (daffodil-debugger-akka,
> daffodil-debugger-zio, etc)
>   - supported ones live in their own subprojects, but other can be plugged
> in from external sources
>   - ability to support multiple implementations reduces risk of lock-in
> - debugger applications
>   - maintained in external repositories
>   - depending on the IO implementation these could execute be in separate
> process or on separate machine
>   - like Steve said, could be any language / framework
>
> Three types of reference implementations / sample applications could also
> guide the development of the API
>   1. a replacement for the existing TUI debugger, expected to end up with
> at minimum the same functionality as the current one.
>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
>   3. an IDE integration
>
> Thoughts?
>
> Also I'm working on some reference implementations of these concepts using
> Akka and Zio.  Not quite ready to talk through it yet, but the code is here
> https://github.com/jw3/example-daffodil-debug
>
>
>
> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
> wrote:
>
>> Yep, something like that seems very reasonable for dealing with large
>> infosets. But it still feels like we still run into usability issues.
>> For example, what if a user wants to see more? We need some
>> configuration options to increase what we've ellided. It's not big, but
>> every new thing that needs configuration adds complexity and decreases
>> usability.
>>
>> And I think the only reason we are trying to spend effort elliding
>> things is because we're limited to this gdb-like interface where you can
>> only print out a little information at a time.
>>
>> I think what would really is to dump this gdb interface and instead use
>> multiple windows/views. As a really close example to what I imagine, I
>> recently came across this hex editor:
>>
>> https://www.synalysis.net/
>>
>> The screenshots are a bit small so it's not super clear, but this tool
>> has one view for the data in hex, and one view for a tree of parsed
>> results (which is very similar to our infoset). The "infoset" view has
>> information like offset/length/value, and can be related back to the
>> data view to find the actual bits.
>>
>> I imagine the "next generation daffodil debugger" to look much like
>> this. As data is parsed, the infoset view fills up. This view could act
>> like a standard GUI tree so you could collapse sections or scroll around
>> to show just the parts you care about, and have search capabilities to
>> quickly jump around. The advantage here is you no longer really need
>> automated eliding or heuristics for what the user *might* care about.
>> You just show the whole thing and let user scroll around. As daffodil
>> parses and backtracks, this tree grows or shrinks.
>>
>> I also imagine you could have a cursor moving around the hex view, so as
>> daffodil moves around (e.g. scanning for delimiters, extracting
>> integers), one could update this data view to show what daffodil is
>> doing and where it is.
>>
>> I also image there could be other views as well. For example, a schema
>> view to show where in the schema daffodil is, and to add/remove
>> breakpoints. And an information view for things like variables, in-scope
>> delimiters, PoU's, etc.
>>
>> The only reason I mention a debug protcol is that would allow this GUI
>> to be more easily written in something other that Java/Scala to take
>> advantage of other GUI toolkits. It's been a long while since I've done
>> anything with Java guis, but they seems pretty poor that last I looked
>> at them. Would even allow for a TUI, which Java has little/no support
>> for. Also enables things like remote deubgging if an socket IPC was
>> used. Though I'm not sure all of that is necessary. Just thinking what
>> would be ideal, and it can always be pared back.
>>
>>
>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
>> > I don't think of it as a daffodil debug protocol, but just a separation
>> of concerns between display of information and the behaviors of
>> parse/unparse that need to be points where users can pause, and data
>> structures available to display.
>> >
>> > E.g., it is 100% a display issue that the infoset (shown as XML) is
>> clumsy, too big, etc.  The infoset is available in the processor state, and
>> one can examine the current node, enclosing node, prior sibling(s),
>> following sibling(s), etc. One can elide contents that are too big for
>> hexBinary, etc.
>> >
>> > I think this problem, how to display the infoset with sensible limits
>> on sizing, is fairly easy to come up with some design for, that will at
>> least be (1) always fairly small (2) much more useful in more cases. It
>> won't be perfect but can be much better than what we do now.
>> >
>> > One sensible display "mode" should be that displaying the context
>> surrounding the current element (when parsing or unparsing) displays at
>> most N-lines. (N/2 before, N/2 after) with a maximum length of L characters
>> (settable within reason ?)
>> >
>> > Sibling and enclosing nodes would be displayed eliding their contents
>> to at most 1 line.
>> >
>> > Here's an example of what I mean. Displaying up to M=10 lines total:
>> >
>> > ...
>> > <enclosingParent1>
>> >    ...
>> >    <priorSibling2>89ab782 ...</...>
>> >    <priorSibling1>some text is here and some more text</...>
>> >    <currentNode>value might be some big thing which needs to be elided
>> ...</...>
>> >    <followingSibling1> ... </...>
>> >    ???
>> > </enclosingParent1>
>> > ???
>> >
>> > The </...> is just an idea to reduce XML matching end-tag clutter.
>> >
>> > The ... on a line alone or where element content would appear generally
>> means 1 or more other siblings. The way the display above starts with ...
>> means that this is a relative inner nest, not starting from the absolute
>> root.
>> >
>> > The ... within simple content means that content is elided to fit on
>> one line. Always follows some text characters to differentiate from the
>> child-element context.
>> >
>> > The ??? means zero or more other siblings.
>> >
>> > I used bold italic above to point out that the current node would be
>> highlighted somehow. Probably a way to do this that doesn't require display
>> modes would be useful. E.g., a text marker like ">>>" as in:
>> >
>> >>>> <currentNode>value .... </...>
>> >
>> > might be better, particularly for a trace output being dumped to a text
>> file.
>> >
>> > I made the above example an unparser kind of example by showing a
>> following sibling that exists that is after the current node.
>> >
>> > I think the key concept is that any sibling node is displayed in a way
>> that fits on one line.
>> > E.g., even if the element name was really long, I'd suggest:
>> >
>> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>> >
>> > Where the element name itself gets elided because it is too long.
>> >
>> > A thought. Note that the above presentation is shown as quasi-XML, but
>> there's nothing XML-specific about it. A JSON-friendly equivalent could be
>> done as well:
>> >
>> > enclosingParent1 = {
>> >    ...
>> >    priorSibling2 = "89ab782..."
>> >    priorSibling1 = "some text is here and some more text"
>> >    currentNode = "value might be some big thing which needs to be
>> elided ..."
>> >    followingSibling1 = { ... }
>> >    ???
>> > }
>> >
>> > That's enough for 1 email thread on this debug topic.
>> >
>> >
>> > ________________________________
>> > From: Steve Lawrence <sl...@apache.org>
>> > Sent: Tuesday, January 5, 2021 2:26 PM
>> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> > Subject: The future of the daffodil DFDL schema debugger?
>> >
>> >
>> > Now that we're in a new year, I'd like to start a discussion about the
>> > Daffodil DFDL Schema debugger and how it might be improved to be more
>> > useful.
>> >
>> > Note that this is not the capabilities to debug Daffodil itself in
>> > something like Eclipse/IntelliJ, but the ability for Daffodil to provide
>> > enough extra information during a parse/unparse so that a schema
>> > developer can get an idea of what Daffodil is doing. This makes it
>> > easier for users (rather than developers) to determine why a schema
>> > isn't giving the expect parse/unparse result (either because of bad data
>> > or a faulty schema.
>> >
>> > The current state of the debugger is enabled by providing the --debug or
>> > --trace flags in the CLI. More information about that here:
>> >
>> > https://daffodil.apache.org/debugger/
>> >
>> > This enables a TUI and commands somewhat similar to GDB, providing thins
>> > like breakpoints, steps, displaying the current infoset, display a dump
>> > of the data, etc.
>> >
>> > Although I find this tool pretty useful, it definitely has some glaring
>> > issues.
>> >
>> > The most glaring to me is that it really isn't useful at all for
>> > debugging unparse. The data dumps only include then main outputstream,
>> > so determine things like suspensions and buffered output is impossible.
>> >
>> > Another issue is the infoset output. When outputting the infoset, the
>> > debugger currently just walks the entire thing and converts it to XML
>> > and displays the XML. For large infosets, this is excess and can make it
>> > impossible to use, even with some configurations the limit how much of
>> > that infoset is actually printed to the screen. Also things like large
>> > hex binary blobs create excessive and unusable output.
>> >
>> > Another thing I feel is missing is a schema view. Right now it's very
>> > difficult to know where in the schema Daffodil actually is.
>> >
>> > I think these issues just need some thought improvement. One could
>> > imagine a better way to stringify our unparse buffers for debug. One
>> > could image a way to receive infoset state changes so the debugger can
>> > track things like backtracks and remove infosets. One could image a way
>> > display the schema
>> >
>> > We just need a better way to stringify the current state of the unparse
>> > data including buffers, and we need a way to for the debugger to receive
>> > state change information about infoset so it can update displays rather
>> > than just constantly printing the entire infoset.
>> >
>> > However, I think another other big issue is just usability in general. I
>> > think the CLI usage is reasonable, but it's not always user friendly,
>> > and is difficult to view multiple things at the same time. I think
>> > because of this very few people even use this tool. So this this like
>> > perhaps something worth focus.
>> >
>> > My first thought to improving this usability issue would be to implement
>> > the Debug Adapter Protocol (DAP)
>> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
>> > which many IDE's implement. With this implemented, Daffodil could be
>> > plugged in to any IDE that supports it and essentially get debugging for
>> > free, without the need to worry about the GUI elements.
>> >
>> > I do have concerns that this just wouldn't have enough functionality
>> > that we'd really need. For example, DAP really only has ability show
>> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
>> > show a live view of the infoset or data. Most DAP IDE's do have a
>> > console output, so we could potentially make it so the console output is
>> > a live view of infoset/data. But I'm not even sure most DAP friendly
>> > IDE's could support this kindof console output. Does anyone have
>> > familiarity with DAP IDE's or and what kinds of console capabilities are
>> > available?
>> >
>> > I also looked into TUI libraries with the idea that we could just extend
>> > our current debugger user interface to be a bit friendlier.
>> > Unfortunately, there aren't too many Java/Scala TUI libraries and those
>> > that do exist don't have Apache friendly licenses. We also want to be
>> > careful about increase dependencies just for a debugger than many people
>> > might not use, so large graphics libraries are probably out of the
>> question.
>> >
>> > This allo makes me wonder if an approach worth taking for the future of
>> > Daffodil schema debugging is developing a sort of "Daffodil Debug
>> > Protocol". I imagine it would be loosely based on DAP (which is
>> > essentially JSON message based) but could be targeted to the things that
>> > a DFDL schema debugger would really need. An added benefit with some
>> > sort of protocol is the debugger interface can be uncoupled from
>> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
>> > language/GUI framework and just have it communicate the protocol over
>> > some form of IPC. Another benefit is that any future backends could
>> > implement this protocol and so a single debugger could hook into
>> > different backends without much issue. Unfortunately, defining such a
>> > protocol might be a large task, but we do have our existing debug
>> > infrastructure and things like DAP to guide its development/design.
>> >
>> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps we
>> > really just need the few improvements mentioned to the existing
>> > debugger. Is that enough to make it usable? Or is an entirely different
>> > approach needed to debugging schemas?
>> >
>>
>>

Re: The future of the daffodil DFDL schema debugger?

Posted by John Wass <jw...@gmail.com>.
> Next step is to refine these thoughts with a prototype.

Another next step is to collect feedback on this research and proposed
approach.  Any discussion is appreciated.



On Tue, Apr 20, 2021 at 10:00 AM John Wass <jw...@gmail.com> wrote:

> > Going to look deeper into how DAP might fit with Daffodil
>
> Have been looking over DAP and getting a good feeling about it. The
> specification [1] seems general enough that it could be applied to Daffodil
> and cover a swath of common operations (like start, stop, break, continue,
> code locations, variables, etc).
>
> There are many areas though that are unique to Daffodil that have no
> representation in the spec.  These things (like InputStream, Infoset, PoU,
> different variable types, backtracking, etc) will need an extension to
> DAP.  This really boils down to defining these things to fit under the DAP
> BaseProtocol and enabling handling of those objects on both the front and
> back ends.
>
> On the backend we need a Daffodil DAP protocol server.  Existing JVM
> implementations (like Java [2], Scala [3]) are tied closely to JDI and
> would bring a lot of extra baggage to work around that.  Developing a
> Daffodil specific implementation is no small task, but feasible.  There are
> a several existing implementations on the JVM that are close and can be
> looked at for reference.
>
> The backend implementation would look similar to what was described in an
> earlier post.  We could use ZIO/Akka/etc to implement the backend Protocol
> Server to enable the IO between the Daffodil process and the DAP clients.
> This implementation would now be guided by the DAP specification.
>
> With the protocol and backend extended to fit Daffodil that leaves the
> frontend.  In theory an existing IDE plugin should get pretty close to
> being able to perform the common debug operations mentioned above.  To
> support the Daffodil extensions there will need to be handling of the
> extended protocol into whatever views are desired/applicable.
>
> > Also looking into the Java Debug Interface (JDI) for comparison.
>
> JDI appears to be the wrong level of abstraction for what we are talking
> about in debugging Daffodil for schema development.  While DAP does do JVM
> debugging (through a JDI DAP impl) it also generalizes to many other
> debugging scenarios.  JDI on the other hand is very tied to the JVM.
>
> Extending the JDI appears to be more complex than dealing with DAP, and
> even though the JDI API is mostly defined with interfaces, there are choke
> points that limit to JVM concepts.  For example jdi.Value has a finite set
> of JVM types that it works with, its not clear where Daffodil types would
> plugin if even possible.
>
> The final note is that unique Daffodil features wouldn’t get to IDE
> support any faster JDI.  In some cases, like VS Code, you would still need
> an extended DAP to support these features.
>
> > and depending on how it shakes out will update the example to show
> integration
>
> It would appear wise to investigate DAP further.  Next step is to refine
> these thoughts with a prototype. I started an implementation in the example
> debugger project [4] to try to run the current example on a _minimal_ DAP
> implementation.
>
>
> [1] https://microsoft.github.io/debug-adapter-protocol/specification
> [2] https://github.com/Microsoft/java-debug
> [3] https://github.com/scalacenter/scala-debug-adapter
> [4] https://github.com/jw3/example-daffodil-debug
>
>
> On Mon, Apr 12, 2021 at 9:58 AM John Wass <jw...@gmail.com> wrote:
>
>> > the code is here https://github.com/jw3/example-daffodil-debug
>>
>> There is now a complete console based example for Zio that demonstrates
>> controlling the debug flow while distributing the current state to three
>> "displays".
>> 1. infoset at current step
>> 2. diff of infoset against previous step
>> 3. bit position and value of data.
>>
>> These displays are very rudimentary but demonstrate the ability to
>> asynchronously populate multiple views while synchronously controlling the
>> debug loop.
>>
>> > - The new protocol being informed by existing debugger and DAPis key
>>
>> Going to look deeper into how DAP might fit with Daffodil, and depending
>> on how it shakes out will update the example to show integration.
>>
>> Some interesting links to start with
>> - https://github.com/scalacenter/scala-debug-adapter
>> -
>> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
>> - https://github.com/microsoft/java-debug
>>
>> Also looking into the Java Debug Interface (JDI) for comparison.
>>
>>
>> On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:
>>
>>> Revisiting this post after doing some debugger related work and thinking
>>> about debug protocol/adapters to connect external tooling to the debug
>>> process.
>>>
>>> This comment is good
>>>
>>> > This allo makes me wonder if an approach worth taking for the future
>>> of Daffodil schema debugging is developing a sort of "Daffodil Debug
>>> Protocol". I imagine it would be loosely based on DAP (which is
>>> essentially JSON message based) but could be targeted to the things that a
>>> DFDL schema debugger would really need. An added benefit with some  sort of
>>> protocol is the debugger interface can be uncoupled from Daffodil
>>> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
>>> framework and just have it communicate the protocol over some form of
>>> IPC. Another benefit is that any future backends could implement this
>>> protocol and so a single debugger could hook into different backends
>>> without much issue. Unfortunately, defining such a protocol might be a
>>> large task, but we do have our existing debug infrastructure and things
>>> like DAP to guide its development/design.
>>>
>>> Some thoughts on this
>>> - Defining the protocol will be a large task, but a minimal version
>>> should get up and round tripping quickly with a minimal subset of the
>>> protocol.
>>> - The new protocol being informed by existing debugger and DAPis key
>>> - Uncoupling from Daffodil is key
>>> - Adapt the Daffodil protocol to produce DAP after the fact so as not to
>>> constrain Daffodil debugging capability
>>> - We dont need to tie the protocol or adapters to a single framework,
>>> implementations of the IO layer should be simple enough to support multiple
>>> things (eg Akka, Zio, "basic" ...)
>>> - The current debugger lives in runtime1, but can we make an abstract
>>> API that any runtime would implement?
>>>
>>> Maybe a solution is structured like this
>>> - daffodil-debug-api:
>>>   - protocol model
>>>   - interfaces: debugger / IO adapter / etc
>>>   - lives in daffodil repo (new subproject?)
>>> - daffodil-debug-io-NAME
>>>   - provides implementation of a specific IO adapter
>>>   - multiple projects possible (daffodil-debugger-akka,
>>> daffodil-debugger-zio, etc)
>>>   - supported ones live in their own subprojects, but other can be
>>> plugged in from external sources
>>>   - ability to support multiple implementations reduces risk of lock-in
>>> - debugger applications
>>>   - maintained in external repositories
>>>   - depending on the IO implementation these could execute be in
>>> separate process or on separate machine
>>>   - like Steve said, could be any language / framework
>>>
>>> Three types of reference implementations / sample applications could
>>> also guide the development of the API
>>>   1. a replacement for the existing TUI debugger, expected to end up
>>> with at minimum the same functionality as the current one.
>>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
>>>   3. an IDE integration
>>>
>>> Thoughts?
>>>
>>> Also I'm working on some reference implementations of these concepts
>>> using Akka and Zio.  Not quite ready to talk through it yet, but the code
>>> is here https://github.com/jw3/example-daffodil-debug
>>>
>>>
>>>
>>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
>>> wrote:
>>>
>>>> Yep, something like that seems very reasonable for dealing with large
>>>> infosets. But it still feels like we still run into usability issues.
>>>> For example, what if a user wants to see more? We need some
>>>> configuration options to increase what we've ellided. It's not big, but
>>>> every new thing that needs configuration adds complexity and decreases
>>>> usability.
>>>>
>>>> And I think the only reason we are trying to spend effort elliding
>>>> things is because we're limited to this gdb-like interface where you can
>>>> only print out a little information at a time.
>>>>
>>>> I think what would really is to dump this gdb interface and instead use
>>>> multiple windows/views. As a really close example to what I imagine, I
>>>> recently came across this hex editor:
>>>>
>>>> https://www.synalysis.net/
>>>>
>>>> The screenshots are a bit small so it's not super clear, but this tool
>>>> has one view for the data in hex, and one view for a tree of parsed
>>>> results (which is very similar to our infoset). The "infoset" view has
>>>> information like offset/length/value, and can be related back to the
>>>> data view to find the actual bits.
>>>>
>>>> I imagine the "next generation daffodil debugger" to look much like
>>>> this. As data is parsed, the infoset view fills up. This view could act
>>>> like a standard GUI tree so you could collapse sections or scroll around
>>>> to show just the parts you care about, and have search capabilities to
>>>> quickly jump around. The advantage here is you no longer really need
>>>> automated eliding or heuristics for what the user *might* care about.
>>>> You just show the whole thing and let user scroll around. As daffodil
>>>> parses and backtracks, this tree grows or shrinks.
>>>>
>>>> I also imagine you could have a cursor moving around the hex view, so as
>>>> daffodil moves around (e.g. scanning for delimiters, extracting
>>>> integers), one could update this data view to show what daffodil is
>>>> doing and where it is.
>>>>
>>>> I also image there could be other views as well. For example, a schema
>>>> view to show where in the schema daffodil is, and to add/remove
>>>> breakpoints. And an information view for things like variables, in-scope
>>>> delimiters, PoU's, etc.
>>>>
>>>> The only reason I mention a debug protcol is that would allow this GUI
>>>> to be more easily written in something other that Java/Scala to take
>>>> advantage of other GUI toolkits. It's been a long while since I've done
>>>> anything with Java guis, but they seems pretty poor that last I looked
>>>> at them. Would even allow for a TUI, which Java has little/no support
>>>> for. Also enables things like remote deubgging if an socket IPC was
>>>> used. Though I'm not sure all of that is necessary. Just thinking what
>>>> would be ideal, and it can always be pared back.
>>>>
>>>>
>>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
>>>> > I don't think of it as a daffodil debug protocol, but just a
>>>> separation of concerns between display of information and the behaviors of
>>>> parse/unparse that need to be points where users can pause, and data
>>>> structures available to display.
>>>> >
>>>> > E.g., it is 100% a display issue that the infoset (shown as XML) is
>>>> clumsy, too big, etc.  The infoset is available in the processor state, and
>>>> one can examine the current node, enclosing node, prior sibling(s),
>>>> following sibling(s), etc. One can elide contents that are too big for
>>>> hexBinary, etc.
>>>> >
>>>> > I think this problem, how to display the infoset with sensible limits
>>>> on sizing, is fairly easy to come up with some design for, that will at
>>>> least be (1) always fairly small (2) much more useful in more cases. It
>>>> won't be perfect but can be much better than what we do now.
>>>> >
>>>> > One sensible display "mode" should be that displaying the context
>>>> surrounding the current element (when parsing or unparsing) displays at
>>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L characters
>>>> (settable within reason ?)
>>>> >
>>>> > Sibling and enclosing nodes would be displayed eliding their contents
>>>> to at most 1 line.
>>>> >
>>>> > Here's an example of what I mean. Displaying up to M=10 lines total:
>>>> >
>>>> > ...
>>>> > <enclosingParent1>
>>>> >    ...
>>>> >    <priorSibling2>89ab782 ...</...>
>>>> >    <priorSibling1>some text is here and some more text</...>
>>>> >    <currentNode>value might be some big thing which needs to be
>>>> elided ...</...>
>>>> >    <followingSibling1> ... </...>
>>>> >    ???
>>>> > </enclosingParent1>
>>>> > ???
>>>> >
>>>> > The </...> is just an idea to reduce XML matching end-tag clutter.
>>>> >
>>>> > The ... on a line alone or where element content would appear
>>>> generally means 1 or more other siblings. The way the display above starts
>>>> with ... means that this is a relative inner nest, not starting from the
>>>> absolute root.
>>>> >
>>>> > The ... within simple content means that content is elided to fit on
>>>> one line. Always follows some text characters to differentiate from the
>>>> child-element context.
>>>> >
>>>> > The ??? means zero or more other siblings.
>>>> >
>>>> > I used bold italic above to point out that the current node would be
>>>> highlighted somehow. Probably a way to do this that doesn't require display
>>>> modes would be useful. E.g., a text marker like ">>>" as in:
>>>> >
>>>> >>>> <currentNode>value .... </...>
>>>> >
>>>> > might be better, particularly for a trace output being dumped to a
>>>> text file.
>>>> >
>>>> > I made the above example an unparser kind of example by showing a
>>>> following sibling that exists that is after the current node.
>>>> >
>>>> > I think the key concept is that any sibling node is displayed in a
>>>> way that fits on one line.
>>>> > E.g., even if the element name was really long, I'd suggest:
>>>> >
>>>> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>>>> >
>>>> > Where the element name itself gets elided because it is too long.
>>>> >
>>>> > A thought. Note that the above presentation is shown as quasi-XML,
>>>> but there's nothing XML-specific about it. A JSON-friendly equivalent could
>>>> be done as well:
>>>> >
>>>> > enclosingParent1 = {
>>>> >    ...
>>>> >    priorSibling2 = "89ab782..."
>>>> >    priorSibling1 = "some text is here and some more text"
>>>> >    currentNode = "value might be some big thing which needs to be
>>>> elided ..."
>>>> >    followingSibling1 = { ... }
>>>> >    ???
>>>> > }
>>>> >
>>>> > That's enough for 1 email thread on this debug topic.
>>>> >
>>>> >
>>>> > ________________________________
>>>> > From: Steve Lawrence <sl...@apache.org>
>>>> > Sent: Tuesday, January 5, 2021 2:26 PM
>>>> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>> > Subject: The future of the daffodil DFDL schema debugger?
>>>> >
>>>> >
>>>> > Now that we're in a new year, I'd like to start a discussion about the
>>>> > Daffodil DFDL Schema debugger and how it might be improved to be more
>>>> > useful.
>>>> >
>>>> > Note that this is not the capabilities to debug Daffodil itself in
>>>> > something like Eclipse/IntelliJ, but the ability for Daffodil to
>>>> provide
>>>> > enough extra information during a parse/unparse so that a schema
>>>> > developer can get an idea of what Daffodil is doing. This makes it
>>>> > easier for users (rather than developers) to determine why a schema
>>>> > isn't giving the expect parse/unparse result (either because of bad
>>>> data
>>>> > or a faulty schema.
>>>> >
>>>> > The current state of the debugger is enabled by providing the --debug
>>>> or
>>>> > --trace flags in the CLI. More information about that here:
>>>> >
>>>> > https://daffodil.apache.org/debugger/
>>>> >
>>>> > This enables a TUI and commands somewhat similar to GDB, providing
>>>> thins
>>>> > like breakpoints, steps, displaying the current infoset, display a
>>>> dump
>>>> > of the data, etc.
>>>> >
>>>> > Although I find this tool pretty useful, it definitely has some
>>>> glaring
>>>> > issues.
>>>> >
>>>> > The most glaring to me is that it really isn't useful at all for
>>>> > debugging unparse. The data dumps only include then main outputstream,
>>>> > so determine things like suspensions and buffered output is
>>>> impossible.
>>>> >
>>>> > Another issue is the infoset output. When outputting the infoset, the
>>>> > debugger currently just walks the entire thing and converts it to XML
>>>> > and displays the XML. For large infosets, this is excess and can make
>>>> it
>>>> > impossible to use, even with some configurations the limit how much of
>>>> > that infoset is actually printed to the screen. Also things like large
>>>> > hex binary blobs create excessive and unusable output.
>>>> >
>>>> > Another thing I feel is missing is a schema view. Right now it's very
>>>> > difficult to know where in the schema Daffodil actually is.
>>>> >
>>>> > I think these issues just need some thought improvement. One could
>>>> > imagine a better way to stringify our unparse buffers for debug. One
>>>> > could image a way to receive infoset state changes so the debugger can
>>>> > track things like backtracks and remove infosets. One could image a
>>>> way
>>>> > display the schema
>>>> >
>>>> > We just need a better way to stringify the current state of the
>>>> unparse
>>>> > data including buffers, and we need a way to for the debugger to
>>>> receive
>>>> > state change information about infoset so it can update displays
>>>> rather
>>>> > than just constantly printing the entire infoset.
>>>> >
>>>> > However, I think another other big issue is just usability in
>>>> general. I
>>>> > think the CLI usage is reasonable, but it's not always user friendly,
>>>> > and is difficult to view multiple things at the same time. I think
>>>> > because of this very few people even use this tool. So this this like
>>>> > perhaps something worth focus.
>>>> >
>>>> > My first thought to improving this usability issue would be to
>>>> implement
>>>> > the Debug Adapter Protocol (DAP)
>>>> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
>>>> > which many IDE's implement. With this implemented, Daffodil could be
>>>> > plugged in to any IDE that supports it and essentially get debugging
>>>> for
>>>> > free, without the need to worry about the GUI elements.
>>>> >
>>>> > I do have concerns that this just wouldn't have enough functionality
>>>> > that we'd really need. For example, DAP really only has ability show
>>>> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
>>>> > show a live view of the infoset or data. Most DAP IDE's do have a
>>>> > console output, so we could potentially make it so the console output
>>>> is
>>>> > a live view of infoset/data. But I'm not even sure most DAP friendly
>>>> > IDE's could support this kindof console output. Does anyone have
>>>> > familiarity with DAP IDE's or and what kinds of console capabilities
>>>> are
>>>> > available?
>>>> >
>>>> > I also looked into TUI libraries with the idea that we could just
>>>> extend
>>>> > our current debugger user interface to be a bit friendlier.
>>>> > Unfortunately, there aren't too many Java/Scala TUI libraries and
>>>> those
>>>> > that do exist don't have Apache friendly licenses. We also want to be
>>>> > careful about increase dependencies just for a debugger than many
>>>> people
>>>> > might not use, so large graphics libraries are probably out of the
>>>> question.
>>>> >
>>>> > This allo makes me wonder if an approach worth taking for the future
>>>> of
>>>> > Daffodil schema debugging is developing a sort of "Daffodil Debug
>>>> > Protocol". I imagine it would be loosely based on DAP (which is
>>>> > essentially JSON message based) but could be targeted to the things
>>>> that
>>>> > a DFDL schema debugger would really need. An added benefit with some
>>>> > sort of protocol is the debugger interface can be uncoupled from
>>>> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
>>>> > language/GUI framework and just have it communicate the protocol over
>>>> > some form of IPC. Another benefit is that any future backends could
>>>> > implement this protocol and so a single debugger could hook into
>>>> > different backends without much issue. Unfortunately, defining such a
>>>> > protocol might be a large task, but we do have our existing debug
>>>> > infrastructure and things like DAP to guide its development/design.
>>>> >
>>>> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
>>>> we
>>>> > really just need the few improvements mentioned to the existing
>>>> > debugger. Is that enough to make it usable? Or is an entirely
>>>> different
>>>> > approach needed to debugging schemas?
>>>> >
>>>>
>>>>

Re: The future of the daffodil DFDL schema debugger?

Posted by Adam Rosien <ad...@rosien.net>.
> It looks like some of the code from java-debug can be reused without
involving JDI. The java-debug project could be viewed as an implementation
of the DAP communication protocol, coupled with JDI to provide
request/response values to DAP. For example, the `ProtocolServer` [3]
hard-codes the JDI, but there's an `AbstractProtocolServer` which only
handles the DAP communication (as a rough guess).

Update: I've successfully used the java-debug project from Microsoft to
speak DAP without requiring any JDI dependencies, for use in integrating
with the Daffodil Debugger interface.

On Mon, Apr 26, 2021 at 10:23 AM Adam Rosien <ad...@rosien.net> wrote:

> I'm currently seeing what it takes to get a minimal VS Code extension
> talking DAP over stdin/stdout to an external Scala process.
>
> On Fri, Apr 23, 2021 at 11:01 AM Adam Rosien <ad...@rosien.net> wrote:
>
>> I've looked at scala-debug-adapter a bit now, and it doesn't do very
>> much: there's some socket stuff and state management, but otherwise it
>> delegates to the underflying java-debug library which manages the DAP
>> protocol [1]. *That* library does assume use of JDI and supplies JVM-level
>> stuff to DAP (threads, etc.).
>>
>> So I think we don't want to rely on the code directly, but could extract
>> the outer "skeleton" of `DebugServer` [2] to use with Daffodil.
>>
>> It looks like some of the code from java-debug can be reused without
>> involving JDI. The java-debug project could be viewed as an implementation
>> of the DAP communication protocol, coupled with JDI to provide
>> request/response values to DAP. For example, the `ProtocolServer` [3]
>> hard-codes the JDI, but there's an `AbstractProtocolServer` which only
>> handles the DAP communication (as a rough guess).
>>
>> I think the next step is to play with the library in the prototype repo
>> to see what is really needed.
>>
>> .. Adam
>>
>> [1]
>> https://github.com/scalacenter/scala-debug-adapter/blob/main/core/src/main/scala/ch/epfl/scala/debugadapter/internal/DebugSession.scala#L35
>> extends java-debug `ProtocolServer`.
>> [2]
>> https://github.com/scalacenter/scala-debug-adapter/blob/main/core/src/main/scala/ch/epfl/scala/debugadapter/DebugServer.scala
>> [3]
>> https://github.com/microsoft/java-debug/blob/master/com.microsoft.java.debug.core/src/main/java/com/microsoft/java/debug/core/adapter/ProtocolServer.java#L52
>>
>> On Thu, Apr 22, 2021 at 1:31 PM John Wass <jw...@gmail.com> wrote:
>>
>>> > dig a bit to see if the DAP-only hooks can be reused without JDI coming
>>> along for the ride
>>>
>>> Cool, that would be good to dig at.  Big win if we can reuse it.
>>>
>>

Re: The future of the daffodil DFDL schema debugger?

Posted by Adam Rosien <ad...@rosien.net>.
I'm currently seeing what it takes to get a minimal VS Code extension
talking DAP over stdin/stdout to an external Scala process.

On Fri, Apr 23, 2021 at 11:01 AM Adam Rosien <ad...@rosien.net> wrote:

> I've looked at scala-debug-adapter a bit now, and it doesn't do very much:
> there's some socket stuff and state management, but otherwise it delegates
> to the underflying java-debug library which manages the DAP protocol [1].
> *That* library does assume use of JDI and supplies JVM-level stuff to DAP
> (threads, etc.).
>
> So I think we don't want to rely on the code directly, but could extract
> the outer "skeleton" of `DebugServer` [2] to use with Daffodil.
>
> It looks like some of the code from java-debug can be reused without
> involving JDI. The java-debug project could be viewed as an implementation
> of the DAP communication protocol, coupled with JDI to provide
> request/response values to DAP. For example, the `ProtocolServer` [3]
> hard-codes the JDI, but there's an `AbstractProtocolServer` which only
> handles the DAP communication (as a rough guess).
>
> I think the next step is to play with the library in the prototype repo to
> see what is really needed.
>
> .. Adam
>
> [1]
> https://github.com/scalacenter/scala-debug-adapter/blob/main/core/src/main/scala/ch/epfl/scala/debugadapter/internal/DebugSession.scala#L35
> extends java-debug `ProtocolServer`.
> [2]
> https://github.com/scalacenter/scala-debug-adapter/blob/main/core/src/main/scala/ch/epfl/scala/debugadapter/DebugServer.scala
> [3]
> https://github.com/microsoft/java-debug/blob/master/com.microsoft.java.debug.core/src/main/java/com/microsoft/java/debug/core/adapter/ProtocolServer.java#L52
>
> On Thu, Apr 22, 2021 at 1:31 PM John Wass <jw...@gmail.com> wrote:
>
>> > dig a bit to see if the DAP-only hooks can be reused without JDI coming
>> along for the ride
>>
>> Cool, that would be good to dig at.  Big win if we can reuse it.
>>
>

Re: The future of the daffodil DFDL schema debugger?

Posted by Adam Rosien <ad...@rosien.net>.
I've looked at scala-debug-adapter a bit now, and it doesn't do very much:
there's some socket stuff and state management, but otherwise it delegates
to the underflying java-debug library which manages the DAP protocol [1].
*That* library does assume use of JDI and supplies JVM-level stuff to DAP
(threads, etc.).

So I think we don't want to rely on the code directly, but could extract
the outer "skeleton" of `DebugServer` [2] to use with Daffodil.

It looks like some of the code from java-debug can be reused without
involving JDI. The java-debug project could be viewed as an implementation
of the DAP communication protocol, coupled with JDI to provide
request/response values to DAP. For example, the `ProtocolServer` [3]
hard-codes the JDI, but there's an `AbstractProtocolServer` which only
handles the DAP communication (as a rough guess).

I think the next step is to play with the library in the prototype repo to
see what is really needed.

.. Adam

[1]
https://github.com/scalacenter/scala-debug-adapter/blob/main/core/src/main/scala/ch/epfl/scala/debugadapter/internal/DebugSession.scala#L35
extends java-debug `ProtocolServer`.
[2]
https://github.com/scalacenter/scala-debug-adapter/blob/main/core/src/main/scala/ch/epfl/scala/debugadapter/DebugServer.scala
[3]
https://github.com/microsoft/java-debug/blob/master/com.microsoft.java.debug.core/src/main/java/com/microsoft/java/debug/core/adapter/ProtocolServer.java#L52

On Thu, Apr 22, 2021 at 1:31 PM John Wass <jw...@gmail.com> wrote:

> > dig a bit to see if the DAP-only hooks can be reused without JDI coming
> along for the ride
>
> Cool, that would be good to dig at.  Big win if we can reuse it.
>

Re: The future of the daffodil DFDL schema debugger?

Posted by John Wass <jw...@gmail.com>.
> dig a bit to see if the DAP-only hooks can be reused without JDI coming
along for the ride

Cool, that would be good to dig at.  Big win if we can reuse it.

Re: The future of the daffodil DFDL schema debugger?

Posted by Adam Rosien <ad...@rosien.net>.
I was thinking of approaching DAP integration via scala-debug-adapter, but
as you say, it is intended to provide JDI-via-DAP, so I'm dig a bit to see
if the DAP-only hooks can be reused without JDI coming along for the ride.

On Wed, Apr 21, 2021 at 5:58 PM John Wass <jw...@gmail.com> wrote:

> Thanks Adam, the DAP variable angle is interesting.  So are you thinking
> all aspects are covered without defining any new DAP interfaces?
>
> What about the backend, do you think a Daffodil debug server implementation
> is needed?
>
> When looking at the Java Debug server, for both Scala and Java, it looked
> very much tied to JDI and debugging a virtual machine.  Did you see
> anything at all that could be reused there?
>
> It seemed to me that whether we extend DAP or not custom backend server
> components need to be implemented to provide Daffodil debug sessions rather
> than the JDI JVM sessions.
>
>
>
>
> On Wed, Apr 21, 2021 at 7:52 PM Adam Rosien <ad...@rosien.net> wrote:
>
> > I've been reading up on DAP and wanted to share...
> >
> > > There are many areas though that are unique to Daffodil that have no
> > representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> > different variable types, backtracking, etc) will need an extension to
> > DAP.  This really boils down to defining these things to fit under the
> DAP
> > BaseProtocol and enabling handling of those objects on both the front and
> > back ends.
> >
> > To me, much of the current state exposed by the (Daffodil) Debugger
> > translates directly to a DAP Variable[1]. DAP Variables can be
> > nested/hierarchical, so they could (potentially) model larger data like
> the
> > infoset. I can imagine shoving all the current state into Variables as a
> > proof-of-concept.
> >
> > It also seems like the processing stack maintained by the Daffodil
> PState,
> > where each item references the relevant schema element, could translate
> to
> > the DAP StackFrame type [2]. That is, the path from the schema root to
> the
> > currently processing schema element becomes the "call stack". (Apologies
> if
> > I don't have all the Daffodil terms lined up correctly.)
> >
> > For displaying the input data and processing progress, I looked at a few
> > existing VS Code extensions that provided non-builtin views, some of
> which
> > interact with their DAP debugger code [3] [4] [5] [6].
> >
> > Finally, I took a cursory look at scala-debug-adapter [7], which, for
> > reference, wraps Microsoft's java-debug implementation of DAP. I was
> > curious about the set of request/response and event types. Additionally,
> > the Typescript API to VS Code offers custom DAP requests and responses,
> but
> > I couldn't find the equivalent notion in the java-debug project.
> >
> > .. Adam
> >
> > [1]
> >
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
> > [2]
> >
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_StackFrame
> > [3] https://github.com/scalameta/metals-vscode (provides a debugger and
> > non-debugger custom UI)
> > [4] https://github.com/microsoft/vscode-cpptools (debugger + memory
> view)
> > [5]
> > https://marketplace.visualstudio.com/items?itemName=marus25.cortex-debug
> > (debugger + memory view,
> >
> >
> https://github.com/Marus/cortex-debug/blob/master/src/frontend/memory_content_provider.ts
> > )
> > [6]
> >
> >
> https://marketplace.visualstudio.com/items?itemName=slevesque.vscode-hexdump
> > (extension for hexdumps that could be controlled by other extensions)
> > [7] https://github.com/scalacenter/scala-debug-adapter
> > [8] https://github.com/microsoft/java-debug
> >
> > On Tue, Apr 20, 2021 at 7:08 AM John Wass <jw...@gmail.com> wrote:
> >
> > > > Going to look deeper into how DAP might fit with Daffodil
> > >
> > > Have been looking over DAP and getting a good feeling about it. The
> > > specification [1] seems general enough that it could be applied to
> > Daffodil
> > > and cover a swath of common operations (like start, stop, break,
> > continue,
> > > code locations, variables, etc).
> > >
> > > There are many areas though that are unique to Daffodil that have no
> > > representation in the spec.  These things (like InputStream, Infoset,
> > PoU,
> > > different variable types, backtracking, etc) will need an extension to
> > > DAP.  This really boils down to defining these things to fit under the
> > DAP
> > > BaseProtocol and enabling handling of those objects on both the front
> and
> > > back ends.
> > >
> > > On the backend we need a Daffodil DAP protocol server.  Existing JVM
> > > implementations (like Java [2], Scala [3]) are tied closely to JDI and
> > > would bring a lot of extra baggage to work around that.  Developing a
> > > Daffodil specific implementation is no small task, but feasible.  There
> > are
> > > a several existing implementations on the JVM that are close and can be
> > > looked at for reference.
> > >
> > > The backend implementation would look similar to what was described in
> an
> > > earlier post.  We could use ZIO/Akka/etc to implement the backend
> > Protocol
> > > Server to enable the IO between the Daffodil process and the DAP
> clients.
> > > This implementation would now be guided by the DAP specification.
> > >
> > > With the protocol and backend extended to fit Daffodil that leaves the
> > > frontend.  In theory an existing IDE plugin should get pretty close to
> > > being able to perform the common debug operations mentioned above.  To
> > > support the Daffodil extensions there will need to be handling of the
> > > extended protocol into whatever views are desired/applicable.
> > >
> > > > Also looking into the Java Debug Interface (JDI) for comparison.
> > >
> > > JDI appears to be the wrong level of abstraction for what we are
> talking
> > > about in debugging Daffodil for schema development.  While DAP does do
> > JVM
> > > debugging (through a JDI DAP impl) it also generalizes to many other
> > > debugging scenarios.  JDI on the other hand is very tied to the JVM.
> > >
> > > Extending the JDI appears to be more complex than dealing with DAP, and
> > > even though the JDI API is mostly defined with interfaces, there are
> > choke
> > > points that limit to JVM concepts.  For example jdi.Value has a finite
> > set
> > > of JVM types that it works with, its not clear where Daffodil types
> would
> > > plugin if even possible.
> > >
> > > The final note is that unique Daffodil features wouldn’t get to IDE
> > support
> > > any faster JDI.  In some cases, like VS Code, you would still need an
> > > extended DAP to support these features.
> > >
> > > > and depending on how it shakes out will update the example to show
> > > integration
> > >
> > > It would appear wise to investigate DAP further.  Next step is to
> refine
> > > these thoughts with a prototype. I started an implementation in the
> > example
> > > debugger project [4] to try to run the current example on a _minimal_
> DAP
> > > implementation.
> > >
> > >
> > > [1] https://microsoft.github.io/debug-adapter-protocol/specification
> > > [2] https://github.com/Microsoft/java-debug
> > > [3] https://github.com/scalacenter/scala-debug-adapter
> > > [4] https://github.com/jw3/example-daffodil-debug
> > >
> > >
> > > On Mon, Apr 12, 2021 at 9:58 AM John Wass <jw...@gmail.com> wrote:
> > >
> > > > > the code is here https://github.com/jw3/example-daffodil-debug
> > > >
> > > > There is now a complete console based example for Zio that
> demonstrates
> > > > controlling the debug flow while distributing the current state to
> > three
> > > > "displays".
> > > > 1. infoset at current step
> > > > 2. diff of infoset against previous step
> > > > 3. bit position and value of data.
> > > >
> > > > These displays are very rudimentary but demonstrate the ability to
> > > > asynchronously populate multiple views while synchronously
> controlling
> > > the
> > > > debug loop.
> > > >
> > > > > - The new protocol being informed by existing debugger and DAPis
> key
> > > >
> > > > Going to look deeper into how DAP might fit with Daffodil, and
> > depending
> > > > on how it shakes out will update the example to show integration.
> > > >
> > > > Some interesting links to start with
> > > > - https://github.com/scalacenter/scala-debug-adapter
> > > > -
> > > >
> > >
> >
> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
> > > > - https://github.com/microsoft/java-debug
> > > >
> > > > Also looking into the Java Debug Interface (JDI) for comparison.
> > > >
> > > >
> > > > On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:
> > > >
> > > >> Revisiting this post after doing some debugger related work and
> > thinking
> > > >> about debug protocol/adapters to connect external tooling to the
> debug
> > > >> process.
> > > >>
> > > >> This comment is good
> > > >>
> > > >> > This allo makes me wonder if an approach worth taking for the
> future
> > > of
> > > >> Daffodil schema debugging is developing a sort of "Daffodil Debug
> > > >> Protocol". I imagine it would be loosely based on DAP (which is
> > > >> essentially JSON message based) but could be targeted to the things
> > > that a
> > > >> DFDL schema debugger would really need. An added benefit with some
> > > sort of
> > > >> protocol is the debugger interface can be uncoupled from Daffodil
> > > >> itself, so we could implement a TUI/GUI/whatever in any
> language/GUI
> > > >> framework and just have it communicate the protocol over some form
> of
> > > >> IPC. Another benefit is that any future backends could implement
> this
> > > >> protocol and so a single debugger could hook into different backends
> > > >> without much issue. Unfortunately, defining such a protocol might
> be a
> > > >> large task, but we do have our existing debug infrastructure and
> > things
> > > >> like DAP to guide its development/design.
> > > >>
> > > >> Some thoughts on this
> > > >> - Defining the protocol will be a large task, but a minimal version
> > > >> should get up and round tripping quickly with a minimal subset of
> the
> > > >> protocol.
> > > >> - The new protocol being informed by existing debugger and DAPis key
> > > >> - Uncoupling from Daffodil is key
> > > >> - Adapt the Daffodil protocol to produce DAP after the fact so as
> not
> > to
> > > >> constrain Daffodil debugging capability
> > > >> - We dont need to tie the protocol or adapters to a single
> framework,
> > > >> implementations of the IO layer should be simple enough to support
> > > multiple
> > > >> things (eg Akka, Zio, "basic" ...)
> > > >> - The current debugger lives in runtime1, but can we make an
> abstract
> > > API
> > > >> that any runtime would implement?
> > > >>
> > > >> Maybe a solution is structured like this
> > > >> - daffodil-debug-api:
> > > >>   - protocol model
> > > >>   - interfaces: debugger / IO adapter / etc
> > > >>   - lives in daffodil repo (new subproject?)
> > > >> - daffodil-debug-io-NAME
> > > >>   - provides implementation of a specific IO adapter
> > > >>   - multiple projects possible (daffodil-debugger-akka,
> > > >> daffodil-debugger-zio, etc)
> > > >>   - supported ones live in their own subprojects, but other can be
> > > >> plugged in from external sources
> > > >>   - ability to support multiple implementations reduces risk of
> > lock-in
> > > >> - debugger applications
> > > >>   - maintained in external repositories
> > > >>   - depending on the IO implementation these could execute be in
> > > separate
> > > >> process or on separate machine
> > > >>   - like Steve said, could be any language / framework
> > > >>
> > > >> Three types of reference implementations / sample applications could
> > > also
> > > >> guide the development of the API
> > > >>   1. a replacement for the existing TUI debugger, expected to end up
> > > with
> > > >> at minimum the same functionality as the current one.
> > > >>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
> > > >>   3. an IDE integration
> > > >>
> > > >> Thoughts?
> > > >>
> > > >> Also I'm working on some reference implementations of these concepts
> > > >> using Akka and Zio.  Not quite ready to talk through it yet, but the
> > > code
> > > >> is here https://github.com/jw3/example-daffodil-debug
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <slawrence@apache.org
> >
> > > >> wrote:
> > > >>
> > > >>> Yep, something like that seems very reasonable for dealing with
> large
> > > >>> infosets. But it still feels like we still run into usability
> issues.
> > > >>> For example, what if a user wants to see more? We need some
> > > >>> configuration options to increase what we've ellided. It's not big,
> > but
> > > >>> every new thing that needs configuration adds complexity and
> > decreases
> > > >>> usability.
> > > >>>
> > > >>> And I think the only reason we are trying to spend effort elliding
> > > >>> things is because we're limited to this gdb-like interface where
> you
> > > can
> > > >>> only print out a little information at a time.
> > > >>>
> > > >>> I think what would really is to dump this gdb interface and instead
> > use
> > > >>> multiple windows/views. As a really close example to what I
> imagine,
> > I
> > > >>> recently came across this hex editor:
> > > >>>
> > > >>> https://www.synalysis.net/
> > > >>>
> > > >>> The screenshots are a bit small so it's not super clear, but this
> > tool
> > > >>> has one view for the data in hex, and one view for a tree of parsed
> > > >>> results (which is very similar to our infoset). The "infoset" view
> > has
> > > >>> information like offset/length/value, and can be related back to
> the
> > > >>> data view to find the actual bits.
> > > >>>
> > > >>> I imagine the "next generation daffodil debugger" to look much like
> > > >>> this. As data is parsed, the infoset view fills up. This view could
> > act
> > > >>> like a standard GUI tree so you could collapse sections or scroll
> > > around
> > > >>> to show just the parts you care about, and have search capabilities
> > to
> > > >>> quickly jump around. The advantage here is you no longer really
> need
> > > >>> automated eliding or heuristics for what the user *might* care
> about.
> > > >>> You just show the whole thing and let user scroll around. As
> daffodil
> > > >>> parses and backtracks, this tree grows or shrinks.
> > > >>>
> > > >>> I also imagine you could have a cursor moving around the hex view,
> so
> > > as
> > > >>> daffodil moves around (e.g. scanning for delimiters, extracting
> > > >>> integers), one could update this data view to show what daffodil is
> > > >>> doing and where it is.
> > > >>>
> > > >>> I also image there could be other views as well. For example, a
> > schema
> > > >>> view to show where in the schema daffodil is, and to add/remove
> > > >>> breakpoints. And an information view for things like variables,
> > > in-scope
> > > >>> delimiters, PoU's, etc.
> > > >>>
> > > >>> The only reason I mention a debug protcol is that would allow this
> > GUI
> > > >>> to be more easily written in something other that Java/Scala to
> take
> > > >>> advantage of other GUI toolkits. It's been a long while since I've
> > done
> > > >>> anything with Java guis, but they seems pretty poor that last I
> > looked
> > > >>> at them. Would even allow for a TUI, which Java has little/no
> support
> > > >>> for. Also enables things like remote deubgging if an socket IPC was
> > > >>> used. Though I'm not sure all of that is necessary. Just thinking
> > what
> > > >>> would be ideal, and it can always be pared back.
> > > >>>
> > > >>>
> > > >>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> > > >>> > I don't think of it as a daffodil debug protocol, but just a
> > > >>> separation of concerns between display of information and the
> > > behaviors of
> > > >>> parse/unparse that need to be points where users can pause, and
> data
> > > >>> structures available to display.
> > > >>> >
> > > >>> > E.g., it is 100% a display issue that the infoset (shown as XML)
> is
> > > >>> clumsy, too big, etc.  The infoset is available in the processor
> > > state, and
> > > >>> one can examine the current node, enclosing node, prior sibling(s),
> > > >>> following sibling(s), etc. One can elide contents that are too big
> > for
> > > >>> hexBinary, etc.
> > > >>> >
> > > >>> > I think this problem, how to display the infoset with sensible
> > limits
> > > >>> on sizing, is fairly easy to come up with some design for, that
> will
> > at
> > > >>> least be (1) always fairly small (2) much more useful in more
> cases.
> > It
> > > >>> won't be perfect but can be much better than what we do now.
> > > >>> >
> > > >>> > One sensible display "mode" should be that displaying the context
> > > >>> surrounding the current element (when parsing or unparsing)
> displays
> > at
> > > >>> most N-lines. (N/2 before, N/2 after) with a maximum length of L
> > > characters
> > > >>> (settable within reason ?)
> > > >>> >
> > > >>> > Sibling and enclosing nodes would be displayed eliding their
> > contents
> > > >>> to at most 1 line.
> > > >>> >
> > > >>> > Here's an example of what I mean. Displaying up to M=10 lines
> > total:
> > > >>> >
> > > >>> > ...
> > > >>> > <enclosingParent1>
> > > >>> >    ...
> > > >>> >    <priorSibling2>89ab782 ...</...>
> > > >>> >    <priorSibling1>some text is here and some more text</...>
> > > >>> >    <currentNode>value might be some big thing which needs to be
> > > elided
> > > >>> ...</...>
> > > >>> >    <followingSibling1> ... </...>
> > > >>> >    ???
> > > >>> > </enclosingParent1>
> > > >>> > ???
> > > >>> >
> > > >>> > The </...> is just an idea to reduce XML matching end-tag
> clutter.
> > > >>> >
> > > >>> > The ... on a line alone or where element content would appear
> > > >>> generally means 1 or more other siblings. The way the display above
> > > starts
> > > >>> with ... means that this is a relative inner nest, not starting
> from
> > > the
> > > >>> absolute root.
> > > >>> >
> > > >>> > The ... within simple content means that content is elided to fit
> > on
> > > >>> one line. Always follows some text characters to differentiate from
> > the
> > > >>> child-element context.
> > > >>> >
> > > >>> > The ??? means zero or more other siblings.
> > > >>> >
> > > >>> > I used bold italic above to point out that the current node would
> > be
> > > >>> highlighted somehow. Probably a way to do this that doesn't require
> > > display
> > > >>> modes would be useful. E.g., a text marker like ">>>" as in:
> > > >>> >
> > > >>> >>>> <currentNode>value .... </...>
> > > >>> >
> > > >>> > might be better, particularly for a trace output being dumped to
> a
> > > >>> text file.
> > > >>> >
> > > >>> > I made the above example an unparser kind of example by showing a
> > > >>> following sibling that exists that is after the current node.
> > > >>> >
> > > >>> > I think the key concept is that any sibling node is displayed in
> a
> > > way
> > > >>> that fits on one line.
> > > >>> > E.g., even if the element name was really long, I'd suggest:
> > > >>> >
> > > >>> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> > > >>> >
> > > >>> > Where the element name itself gets elided because it is too long.
> > > >>> >
> > > >>> > A thought. Note that the above presentation is shown as
> quasi-XML,
> > > but
> > > >>> there's nothing XML-specific about it. A JSON-friendly equivalent
> > > could be
> > > >>> done as well:
> > > >>> >
> > > >>> > enclosingParent1 = {
> > > >>> >    ...
> > > >>> >    priorSibling2 = "89ab782..."
> > > >>> >    priorSibling1 = "some text is here and some more text"
> > > >>> >    currentNode = "value might be some big thing which needs to be
> > > >>> elided ..."
> > > >>> >    followingSibling1 = { ... }
> > > >>> >    ???
> > > >>> > }
> > > >>> >
> > > >>> > That's enough for 1 email thread on this debug topic.
> > > >>> >
> > > >>> >
> > > >>> > ________________________________
> > > >>> > From: Steve Lawrence <sl...@apache.org>
> > > >>> > Sent: Tuesday, January 5, 2021 2:26 PM
> > > >>> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> > > >>> > Subject: The future of the daffodil DFDL schema debugger?
> > > >>> >
> > > >>> >
> > > >>> > Now that we're in a new year, I'd like to start a discussion
> about
> > > the
> > > >>> > Daffodil DFDL Schema debugger and how it might be improved to be
> > more
> > > >>> > useful.
> > > >>> >
> > > >>> > Note that this is not the capabilities to debug Daffodil itself
> in
> > > >>> > something like Eclipse/IntelliJ, but the ability for Daffodil to
> > > >>> provide
> > > >>> > enough extra information during a parse/unparse so that a schema
> > > >>> > developer can get an idea of what Daffodil is doing. This makes
> it
> > > >>> > easier for users (rather than developers) to determine why a
> schema
> > > >>> > isn't giving the expect parse/unparse result (either because of
> bad
> > > >>> data
> > > >>> > or a faulty schema.
> > > >>> >
> > > >>> > The current state of the debugger is enabled by providing the
> > --debug
> > > >>> or
> > > >>> > --trace flags in the CLI. More information about that here:
> > > >>> >
> > > >>> > https://daffodil.apache.org/debugger/
> > > >>> >
> > > >>> > This enables a TUI and commands somewhat similar to GDB,
> providing
> > > >>> thins
> > > >>> > like breakpoints, steps, displaying the current infoset, display
> a
> > > dump
> > > >>> > of the data, etc.
> > > >>> >
> > > >>> > Although I find this tool pretty useful, it definitely has some
> > > glaring
> > > >>> > issues.
> > > >>> >
> > > >>> > The most glaring to me is that it really isn't useful at all for
> > > >>> > debugging unparse. The data dumps only include then main
> > > outputstream,
> > > >>> > so determine things like suspensions and buffered output is
> > > impossible.
> > > >>> >
> > > >>> > Another issue is the infoset output. When outputting the infoset,
> > the
> > > >>> > debugger currently just walks the entire thing and converts it to
> > XML
> > > >>> > and displays the XML. For large infosets, this is excess and can
> > make
> > > >>> it
> > > >>> > impossible to use, even with some configurations the limit how
> much
> > > of
> > > >>> > that infoset is actually printed to the screen. Also things like
> > > large
> > > >>> > hex binary blobs create excessive and unusable output.
> > > >>> >
> > > >>> > Another thing I feel is missing is a schema view. Right now it's
> > very
> > > >>> > difficult to know where in the schema Daffodil actually is.
> > > >>> >
> > > >>> > I think these issues just need some thought improvement. One
> could
> > > >>> > imagine a better way to stringify our unparse buffers for debug.
> > One
> > > >>> > could image a way to receive infoset state changes so the
> debugger
> > > can
> > > >>> > track things like backtracks and remove infosets. One could
> image a
> > > way
> > > >>> > display the schema
> > > >>> >
> > > >>> > We just need a better way to stringify the current state of the
> > > unparse
> > > >>> > data including buffers, and we need a way to for the debugger to
> > > >>> receive
> > > >>> > state change information about infoset so it can update displays
> > > rather
> > > >>> > than just constantly printing the entire infoset.
> > > >>> >
> > > >>> > However, I think another other big issue is just usability in
> > > general.
> > > >>> I
> > > >>> > think the CLI usage is reasonable, but it's not always user
> > friendly,
> > > >>> > and is difficult to view multiple things at the same time. I
> think
> > > >>> > because of this very few people even use this tool. So this this
> > like
> > > >>> > perhaps something worth focus.
> > > >>> >
> > > >>> > My first thought to improving this usability issue would be to
> > > >>> implement
> > > >>> > the Debug Adapter Protocol (DAP)
> > > >>> > (https://microsoft.github.io/debug-adapter-protocol/) for
> > Daffodil,
> > > >>> > which many IDE's implement. With this implemented, Daffodil could
> > be
> > > >>> > plugged in to any IDE that supports it and essentially get
> > debugging
> > > >>> for
> > > >>> > free, without the need to worry about the GUI elements.
> > > >>> >
> > > >>> > I do have concerns that this just wouldn't have enough
> > functionality
> > > >>> > that we'd really need. For example, DAP really only has ability
> > show
> > > >>> > code (Daffodil's equivalent is the DFDL schema). There isn't a
> way
> > to
> > > >>> > show a live view of the infoset or data. Most DAP IDE's do have a
> > > >>> > console output, so we could potentially make it so the console
> > output
> > > >>> is
> > > >>> > a live view of infoset/data. But I'm not even sure most DAP
> > friendly
> > > >>> > IDE's could support this kindof console output. Does anyone have
> > > >>> > familiarity with DAP IDE's or and what kinds of console
> > capabilities
> > > >>> are
> > > >>> > available?
> > > >>> >
> > > >>> > I also looked into TUI libraries with the idea that we could just
> > > >>> extend
> > > >>> > our current debugger user interface to be a bit friendlier.
> > > >>> > Unfortunately, there aren't too many Java/Scala TUI libraries and
> > > those
> > > >>> > that do exist don't have Apache friendly licenses. We also want
> to
> > be
> > > >>> > careful about increase dependencies just for a debugger than many
> > > >>> people
> > > >>> > might not use, so large graphics libraries are probably out of
> the
> > > >>> question.
> > > >>> >
> > > >>> > This allo makes me wonder if an approach worth taking for the
> > future
> > > of
> > > >>> > Daffodil schema debugging is developing a sort of "Daffodil Debug
> > > >>> > Protocol". I imagine it would be loosely based on DAP (which is
> > > >>> > essentially JSON message based) but could be targeted to the
> things
> > > >>> that
> > > >>> > a DFDL schema debugger would really need. An added benefit with
> > some
> > > >>> > sort of protocol is the debugger interface can be uncoupled from
> > > >>> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
> > > >>> > language/GUI framework and just have it communicate the protocol
> > over
> > > >>> > some form of IPC. Another benefit is that any future backends
> could
> > > >>> > implement this protocol and so a single debugger could hook into
> > > >>> > different backends without much issue. Unfortunately, defining
> > such a
> > > >>> > protocol might be a large task, but we do have our existing debug
> > > >>> > infrastructure and things like DAP to guide its
> development/design.
> > > >>> >
> > > >>> > Thoughts? Does such a Daffodil Debug Protocol seem worth it?
> > Perhaps
> > > we
> > > >>> > really just need the few improvements mentioned to the existing
> > > >>> > debugger. Is that enough to make it usable? Or is an entirely
> > > different
> > > >>> > approach needed to debugging schemas?
> > > >>> >
> > > >>>
> > > >>>
> > >
> >
>

Re: The future of the daffodil DFDL schema debugger?

Posted by John Wass <jw...@gmail.com>.
Thanks Adam, the DAP variable angle is interesting.  So are you thinking
all aspects are covered without defining any new DAP interfaces?

What about the backend, do you think a Daffodil debug server implementation
is needed?

When looking at the Java Debug server, for both Scala and Java, it looked
very much tied to JDI and debugging a virtual machine.  Did you see
anything at all that could be reused there?

It seemed to me that whether we extend DAP or not custom backend server
components need to be implemented to provide Daffodil debug sessions rather
than the JDI JVM sessions.




On Wed, Apr 21, 2021 at 7:52 PM Adam Rosien <ad...@rosien.net> wrote:

> I've been reading up on DAP and wanted to share...
>
> > There are many areas though that are unique to Daffodil that have no
> representation in the spec.  These things (like InputStream, Infoset, PoU,
> different variable types, backtracking, etc) will need an extension to
> DAP.  This really boils down to defining these things to fit under the DAP
> BaseProtocol and enabling handling of those objects on both the front and
> back ends.
>
> To me, much of the current state exposed by the (Daffodil) Debugger
> translates directly to a DAP Variable[1]. DAP Variables can be
> nested/hierarchical, so they could (potentially) model larger data like the
> infoset. I can imagine shoving all the current state into Variables as a
> proof-of-concept.
>
> It also seems like the processing stack maintained by the Daffodil PState,
> where each item references the relevant schema element, could translate to
> the DAP StackFrame type [2]. That is, the path from the schema root to the
> currently processing schema element becomes the "call stack". (Apologies if
> I don't have all the Daffodil terms lined up correctly.)
>
> For displaying the input data and processing progress, I looked at a few
> existing VS Code extensions that provided non-builtin views, some of which
> interact with their DAP debugger code [3] [4] [5] [6].
>
> Finally, I took a cursory look at scala-debug-adapter [7], which, for
> reference, wraps Microsoft's java-debug implementation of DAP. I was
> curious about the set of request/response and event types. Additionally,
> the Typescript API to VS Code offers custom DAP requests and responses, but
> I couldn't find the equivalent notion in the java-debug project.
>
> .. Adam
>
> [1]
>
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
> [2]
>
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_StackFrame
> [3] https://github.com/scalameta/metals-vscode (provides a debugger and
> non-debugger custom UI)
> [4] https://github.com/microsoft/vscode-cpptools (debugger + memory view)
> [5]
> https://marketplace.visualstudio.com/items?itemName=marus25.cortex-debug
> (debugger + memory view,
>
> https://github.com/Marus/cortex-debug/blob/master/src/frontend/memory_content_provider.ts
> )
> [6]
>
> https://marketplace.visualstudio.com/items?itemName=slevesque.vscode-hexdump
> (extension for hexdumps that could be controlled by other extensions)
> [7] https://github.com/scalacenter/scala-debug-adapter
> [8] https://github.com/microsoft/java-debug
>
> On Tue, Apr 20, 2021 at 7:08 AM John Wass <jw...@gmail.com> wrote:
>
> > > Going to look deeper into how DAP might fit with Daffodil
> >
> > Have been looking over DAP and getting a good feeling about it. The
> > specification [1] seems general enough that it could be applied to
> Daffodil
> > and cover a swath of common operations (like start, stop, break,
> continue,
> > code locations, variables, etc).
> >
> > There are many areas though that are unique to Daffodil that have no
> > representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> > different variable types, backtracking, etc) will need an extension to
> > DAP.  This really boils down to defining these things to fit under the
> DAP
> > BaseProtocol and enabling handling of those objects on both the front and
> > back ends.
> >
> > On the backend we need a Daffodil DAP protocol server.  Existing JVM
> > implementations (like Java [2], Scala [3]) are tied closely to JDI and
> > would bring a lot of extra baggage to work around that.  Developing a
> > Daffodil specific implementation is no small task, but feasible.  There
> are
> > a several existing implementations on the JVM that are close and can be
> > looked at for reference.
> >
> > The backend implementation would look similar to what was described in an
> > earlier post.  We could use ZIO/Akka/etc to implement the backend
> Protocol
> > Server to enable the IO between the Daffodil process and the DAP clients.
> > This implementation would now be guided by the DAP specification.
> >
> > With the protocol and backend extended to fit Daffodil that leaves the
> > frontend.  In theory an existing IDE plugin should get pretty close to
> > being able to perform the common debug operations mentioned above.  To
> > support the Daffodil extensions there will need to be handling of the
> > extended protocol into whatever views are desired/applicable.
> >
> > > Also looking into the Java Debug Interface (JDI) for comparison.
> >
> > JDI appears to be the wrong level of abstraction for what we are talking
> > about in debugging Daffodil for schema development.  While DAP does do
> JVM
> > debugging (through a JDI DAP impl) it also generalizes to many other
> > debugging scenarios.  JDI on the other hand is very tied to the JVM.
> >
> > Extending the JDI appears to be more complex than dealing with DAP, and
> > even though the JDI API is mostly defined with interfaces, there are
> choke
> > points that limit to JVM concepts.  For example jdi.Value has a finite
> set
> > of JVM types that it works with, its not clear where Daffodil types would
> > plugin if even possible.
> >
> > The final note is that unique Daffodil features wouldn’t get to IDE
> support
> > any faster JDI.  In some cases, like VS Code, you would still need an
> > extended DAP to support these features.
> >
> > > and depending on how it shakes out will update the example to show
> > integration
> >
> > It would appear wise to investigate DAP further.  Next step is to refine
> > these thoughts with a prototype. I started an implementation in the
> example
> > debugger project [4] to try to run the current example on a _minimal_ DAP
> > implementation.
> >
> >
> > [1] https://microsoft.github.io/debug-adapter-protocol/specification
> > [2] https://github.com/Microsoft/java-debug
> > [3] https://github.com/scalacenter/scala-debug-adapter
> > [4] https://github.com/jw3/example-daffodil-debug
> >
> >
> > On Mon, Apr 12, 2021 at 9:58 AM John Wass <jw...@gmail.com> wrote:
> >
> > > > the code is here https://github.com/jw3/example-daffodil-debug
> > >
> > > There is now a complete console based example for Zio that demonstrates
> > > controlling the debug flow while distributing the current state to
> three
> > > "displays".
> > > 1. infoset at current step
> > > 2. diff of infoset against previous step
> > > 3. bit position and value of data.
> > >
> > > These displays are very rudimentary but demonstrate the ability to
> > > asynchronously populate multiple views while synchronously controlling
> > the
> > > debug loop.
> > >
> > > > - The new protocol being informed by existing debugger and DAPis key
> > >
> > > Going to look deeper into how DAP might fit with Daffodil, and
> depending
> > > on how it shakes out will update the example to show integration.
> > >
> > > Some interesting links to start with
> > > - https://github.com/scalacenter/scala-debug-adapter
> > > -
> > >
> >
> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
> > > - https://github.com/microsoft/java-debug
> > >
> > > Also looking into the Java Debug Interface (JDI) for comparison.
> > >
> > >
> > > On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:
> > >
> > >> Revisiting this post after doing some debugger related work and
> thinking
> > >> about debug protocol/adapters to connect external tooling to the debug
> > >> process.
> > >>
> > >> This comment is good
> > >>
> > >> > This allo makes me wonder if an approach worth taking for the future
> > of
> > >> Daffodil schema debugging is developing a sort of "Daffodil Debug
> > >> Protocol". I imagine it would be loosely based on DAP (which is
> > >> essentially JSON message based) but could be targeted to the things
> > that a
> > >> DFDL schema debugger would really need. An added benefit with some
> > sort of
> > >> protocol is the debugger interface can be uncoupled from Daffodil
> > >> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
> > >> framework and just have it communicate the protocol over some form of
> > >> IPC. Another benefit is that any future backends could implement this
> > >> protocol and so a single debugger could hook into different backends
> > >> without much issue. Unfortunately, defining such a protocol might be a
> > >> large task, but we do have our existing debug infrastructure and
> things
> > >> like DAP to guide its development/design.
> > >>
> > >> Some thoughts on this
> > >> - Defining the protocol will be a large task, but a minimal version
> > >> should get up and round tripping quickly with a minimal subset of the
> > >> protocol.
> > >> - The new protocol being informed by existing debugger and DAPis key
> > >> - Uncoupling from Daffodil is key
> > >> - Adapt the Daffodil protocol to produce DAP after the fact so as not
> to
> > >> constrain Daffodil debugging capability
> > >> - We dont need to tie the protocol or adapters to a single framework,
> > >> implementations of the IO layer should be simple enough to support
> > multiple
> > >> things (eg Akka, Zio, "basic" ...)
> > >> - The current debugger lives in runtime1, but can we make an abstract
> > API
> > >> that any runtime would implement?
> > >>
> > >> Maybe a solution is structured like this
> > >> - daffodil-debug-api:
> > >>   - protocol model
> > >>   - interfaces: debugger / IO adapter / etc
> > >>   - lives in daffodil repo (new subproject?)
> > >> - daffodil-debug-io-NAME
> > >>   - provides implementation of a specific IO adapter
> > >>   - multiple projects possible (daffodil-debugger-akka,
> > >> daffodil-debugger-zio, etc)
> > >>   - supported ones live in their own subprojects, but other can be
> > >> plugged in from external sources
> > >>   - ability to support multiple implementations reduces risk of
> lock-in
> > >> - debugger applications
> > >>   - maintained in external repositories
> > >>   - depending on the IO implementation these could execute be in
> > separate
> > >> process or on separate machine
> > >>   - like Steve said, could be any language / framework
> > >>
> > >> Three types of reference implementations / sample applications could
> > also
> > >> guide the development of the API
> > >>   1. a replacement for the existing TUI debugger, expected to end up
> > with
> > >> at minimum the same functionality as the current one.
> > >>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
> > >>   3. an IDE integration
> > >>
> > >> Thoughts?
> > >>
> > >> Also I'm working on some reference implementations of these concepts
> > >> using Akka and Zio.  Not quite ready to talk through it yet, but the
> > code
> > >> is here https://github.com/jw3/example-daffodil-debug
> > >>
> > >>
> > >>
> > >> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
> > >> wrote:
> > >>
> > >>> Yep, something like that seems very reasonable for dealing with large
> > >>> infosets. But it still feels like we still run into usability issues.
> > >>> For example, what if a user wants to see more? We need some
> > >>> configuration options to increase what we've ellided. It's not big,
> but
> > >>> every new thing that needs configuration adds complexity and
> decreases
> > >>> usability.
> > >>>
> > >>> And I think the only reason we are trying to spend effort elliding
> > >>> things is because we're limited to this gdb-like interface where you
> > can
> > >>> only print out a little information at a time.
> > >>>
> > >>> I think what would really is to dump this gdb interface and instead
> use
> > >>> multiple windows/views. As a really close example to what I imagine,
> I
> > >>> recently came across this hex editor:
> > >>>
> > >>> https://www.synalysis.net/
> > >>>
> > >>> The screenshots are a bit small so it's not super clear, but this
> tool
> > >>> has one view for the data in hex, and one view for a tree of parsed
> > >>> results (which is very similar to our infoset). The "infoset" view
> has
> > >>> information like offset/length/value, and can be related back to the
> > >>> data view to find the actual bits.
> > >>>
> > >>> I imagine the "next generation daffodil debugger" to look much like
> > >>> this. As data is parsed, the infoset view fills up. This view could
> act
> > >>> like a standard GUI tree so you could collapse sections or scroll
> > around
> > >>> to show just the parts you care about, and have search capabilities
> to
> > >>> quickly jump around. The advantage here is you no longer really need
> > >>> automated eliding or heuristics for what the user *might* care about.
> > >>> You just show the whole thing and let user scroll around. As daffodil
> > >>> parses and backtracks, this tree grows or shrinks.
> > >>>
> > >>> I also imagine you could have a cursor moving around the hex view, so
> > as
> > >>> daffodil moves around (e.g. scanning for delimiters, extracting
> > >>> integers), one could update this data view to show what daffodil is
> > >>> doing and where it is.
> > >>>
> > >>> I also image there could be other views as well. For example, a
> schema
> > >>> view to show where in the schema daffodil is, and to add/remove
> > >>> breakpoints. And an information view for things like variables,
> > in-scope
> > >>> delimiters, PoU's, etc.
> > >>>
> > >>> The only reason I mention a debug protcol is that would allow this
> GUI
> > >>> to be more easily written in something other that Java/Scala to take
> > >>> advantage of other GUI toolkits. It's been a long while since I've
> done
> > >>> anything with Java guis, but they seems pretty poor that last I
> looked
> > >>> at them. Would even allow for a TUI, which Java has little/no support
> > >>> for. Also enables things like remote deubgging if an socket IPC was
> > >>> used. Though I'm not sure all of that is necessary. Just thinking
> what
> > >>> would be ideal, and it can always be pared back.
> > >>>
> > >>>
> > >>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> > >>> > I don't think of it as a daffodil debug protocol, but just a
> > >>> separation of concerns between display of information and the
> > behaviors of
> > >>> parse/unparse that need to be points where users can pause, and data
> > >>> structures available to display.
> > >>> >
> > >>> > E.g., it is 100% a display issue that the infoset (shown as XML) is
> > >>> clumsy, too big, etc.  The infoset is available in the processor
> > state, and
> > >>> one can examine the current node, enclosing node, prior sibling(s),
> > >>> following sibling(s), etc. One can elide contents that are too big
> for
> > >>> hexBinary, etc.
> > >>> >
> > >>> > I think this problem, how to display the infoset with sensible
> limits
> > >>> on sizing, is fairly easy to come up with some design for, that will
> at
> > >>> least be (1) always fairly small (2) much more useful in more cases.
> It
> > >>> won't be perfect but can be much better than what we do now.
> > >>> >
> > >>> > One sensible display "mode" should be that displaying the context
> > >>> surrounding the current element (when parsing or unparsing) displays
> at
> > >>> most N-lines. (N/2 before, N/2 after) with a maximum length of L
> > characters
> > >>> (settable within reason ?)
> > >>> >
> > >>> > Sibling and enclosing nodes would be displayed eliding their
> contents
> > >>> to at most 1 line.
> > >>> >
> > >>> > Here's an example of what I mean. Displaying up to M=10 lines
> total:
> > >>> >
> > >>> > ...
> > >>> > <enclosingParent1>
> > >>> >    ...
> > >>> >    <priorSibling2>89ab782 ...</...>
> > >>> >    <priorSibling1>some text is here and some more text</...>
> > >>> >    <currentNode>value might be some big thing which needs to be
> > elided
> > >>> ...</...>
> > >>> >    <followingSibling1> ... </...>
> > >>> >    ???
> > >>> > </enclosingParent1>
> > >>> > ???
> > >>> >
> > >>> > The </...> is just an idea to reduce XML matching end-tag clutter.
> > >>> >
> > >>> > The ... on a line alone or where element content would appear
> > >>> generally means 1 or more other siblings. The way the display above
> > starts
> > >>> with ... means that this is a relative inner nest, not starting from
> > the
> > >>> absolute root.
> > >>> >
> > >>> > The ... within simple content means that content is elided to fit
> on
> > >>> one line. Always follows some text characters to differentiate from
> the
> > >>> child-element context.
> > >>> >
> > >>> > The ??? means zero or more other siblings.
> > >>> >
> > >>> > I used bold italic above to point out that the current node would
> be
> > >>> highlighted somehow. Probably a way to do this that doesn't require
> > display
> > >>> modes would be useful. E.g., a text marker like ">>>" as in:
> > >>> >
> > >>> >>>> <currentNode>value .... </...>
> > >>> >
> > >>> > might be better, particularly for a trace output being dumped to a
> > >>> text file.
> > >>> >
> > >>> > I made the above example an unparser kind of example by showing a
> > >>> following sibling that exists that is after the current node.
> > >>> >
> > >>> > I think the key concept is that any sibling node is displayed in a
> > way
> > >>> that fits on one line.
> > >>> > E.g., even if the element name was really long, I'd suggest:
> > >>> >
> > >>> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> > >>> >
> > >>> > Where the element name itself gets elided because it is too long.
> > >>> >
> > >>> > A thought. Note that the above presentation is shown as quasi-XML,
> > but
> > >>> there's nothing XML-specific about it. A JSON-friendly equivalent
> > could be
> > >>> done as well:
> > >>> >
> > >>> > enclosingParent1 = {
> > >>> >    ...
> > >>> >    priorSibling2 = "89ab782..."
> > >>> >    priorSibling1 = "some text is here and some more text"
> > >>> >    currentNode = "value might be some big thing which needs to be
> > >>> elided ..."
> > >>> >    followingSibling1 = { ... }
> > >>> >    ???
> > >>> > }
> > >>> >
> > >>> > That's enough for 1 email thread on this debug topic.
> > >>> >
> > >>> >
> > >>> > ________________________________
> > >>> > From: Steve Lawrence <sl...@apache.org>
> > >>> > Sent: Tuesday, January 5, 2021 2:26 PM
> > >>> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> > >>> > Subject: The future of the daffodil DFDL schema debugger?
> > >>> >
> > >>> >
> > >>> > Now that we're in a new year, I'd like to start a discussion about
> > the
> > >>> > Daffodil DFDL Schema debugger and how it might be improved to be
> more
> > >>> > useful.
> > >>> >
> > >>> > Note that this is not the capabilities to debug Daffodil itself in
> > >>> > something like Eclipse/IntelliJ, but the ability for Daffodil to
> > >>> provide
> > >>> > enough extra information during a parse/unparse so that a schema
> > >>> > developer can get an idea of what Daffodil is doing. This makes it
> > >>> > easier for users (rather than developers) to determine why a schema
> > >>> > isn't giving the expect parse/unparse result (either because of bad
> > >>> data
> > >>> > or a faulty schema.
> > >>> >
> > >>> > The current state of the debugger is enabled by providing the
> --debug
> > >>> or
> > >>> > --trace flags in the CLI. More information about that here:
> > >>> >
> > >>> > https://daffodil.apache.org/debugger/
> > >>> >
> > >>> > This enables a TUI and commands somewhat similar to GDB, providing
> > >>> thins
> > >>> > like breakpoints, steps, displaying the current infoset, display a
> > dump
> > >>> > of the data, etc.
> > >>> >
> > >>> > Although I find this tool pretty useful, it definitely has some
> > glaring
> > >>> > issues.
> > >>> >
> > >>> > The most glaring to me is that it really isn't useful at all for
> > >>> > debugging unparse. The data dumps only include then main
> > outputstream,
> > >>> > so determine things like suspensions and buffered output is
> > impossible.
> > >>> >
> > >>> > Another issue is the infoset output. When outputting the infoset,
> the
> > >>> > debugger currently just walks the entire thing and converts it to
> XML
> > >>> > and displays the XML. For large infosets, this is excess and can
> make
> > >>> it
> > >>> > impossible to use, even with some configurations the limit how much
> > of
> > >>> > that infoset is actually printed to the screen. Also things like
> > large
> > >>> > hex binary blobs create excessive and unusable output.
> > >>> >
> > >>> > Another thing I feel is missing is a schema view. Right now it's
> very
> > >>> > difficult to know where in the schema Daffodil actually is.
> > >>> >
> > >>> > I think these issues just need some thought improvement. One could
> > >>> > imagine a better way to stringify our unparse buffers for debug.
> One
> > >>> > could image a way to receive infoset state changes so the debugger
> > can
> > >>> > track things like backtracks and remove infosets. One could image a
> > way
> > >>> > display the schema
> > >>> >
> > >>> > We just need a better way to stringify the current state of the
> > unparse
> > >>> > data including buffers, and we need a way to for the debugger to
> > >>> receive
> > >>> > state change information about infoset so it can update displays
> > rather
> > >>> > than just constantly printing the entire infoset.
> > >>> >
> > >>> > However, I think another other big issue is just usability in
> > general.
> > >>> I
> > >>> > think the CLI usage is reasonable, but it's not always user
> friendly,
> > >>> > and is difficult to view multiple things at the same time. I think
> > >>> > because of this very few people even use this tool. So this this
> like
> > >>> > perhaps something worth focus.
> > >>> >
> > >>> > My first thought to improving this usability issue would be to
> > >>> implement
> > >>> > the Debug Adapter Protocol (DAP)
> > >>> > (https://microsoft.github.io/debug-adapter-protocol/) for
> Daffodil,
> > >>> > which many IDE's implement. With this implemented, Daffodil could
> be
> > >>> > plugged in to any IDE that supports it and essentially get
> debugging
> > >>> for
> > >>> > free, without the need to worry about the GUI elements.
> > >>> >
> > >>> > I do have concerns that this just wouldn't have enough
> functionality
> > >>> > that we'd really need. For example, DAP really only has ability
> show
> > >>> > code (Daffodil's equivalent is the DFDL schema). There isn't a way
> to
> > >>> > show a live view of the infoset or data. Most DAP IDE's do have a
> > >>> > console output, so we could potentially make it so the console
> output
> > >>> is
> > >>> > a live view of infoset/data. But I'm not even sure most DAP
> friendly
> > >>> > IDE's could support this kindof console output. Does anyone have
> > >>> > familiarity with DAP IDE's or and what kinds of console
> capabilities
> > >>> are
> > >>> > available?
> > >>> >
> > >>> > I also looked into TUI libraries with the idea that we could just
> > >>> extend
> > >>> > our current debugger user interface to be a bit friendlier.
> > >>> > Unfortunately, there aren't too many Java/Scala TUI libraries and
> > those
> > >>> > that do exist don't have Apache friendly licenses. We also want to
> be
> > >>> > careful about increase dependencies just for a debugger than many
> > >>> people
> > >>> > might not use, so large graphics libraries are probably out of the
> > >>> question.
> > >>> >
> > >>> > This allo makes me wonder if an approach worth taking for the
> future
> > of
> > >>> > Daffodil schema debugging is developing a sort of "Daffodil Debug
> > >>> > Protocol". I imagine it would be loosely based on DAP (which is
> > >>> > essentially JSON message based) but could be targeted to the things
> > >>> that
> > >>> > a DFDL schema debugger would really need. An added benefit with
> some
> > >>> > sort of protocol is the debugger interface can be uncoupled from
> > >>> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
> > >>> > language/GUI framework and just have it communicate the protocol
> over
> > >>> > some form of IPC. Another benefit is that any future backends could
> > >>> > implement this protocol and so a single debugger could hook into
> > >>> > different backends without much issue. Unfortunately, defining
> such a
> > >>> > protocol might be a large task, but we do have our existing debug
> > >>> > infrastructure and things like DAP to guide its development/design.
> > >>> >
> > >>> > Thoughts? Does such a Daffodil Debug Protocol seem worth it?
> Perhaps
> > we
> > >>> > really just need the few improvements mentioned to the existing
> > >>> > debugger. Is that enough to make it usable? Or is an entirely
> > different
> > >>> > approach needed to debugging schemas?
> > >>> >
> > >>>
> > >>>
> >
>

Re: The future of the daffodil DFDL schema debugger?

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
I think the point was to understand debugging in daffodil, one must understand, and potentially have to display, the data structures that the runtime maintains.

Furthermore, some of the actions the parser/unparser takes are universal, like invoking a parser. Others require finer detail than that - e.g., delimiter scanning certainly needs more detailed treatment from the debugger.

But first approximation is there should be some way to display, inspect, and potentially manipulate each piece of state.

________________________________
From: John Wass <jw...@gmail.com>
Sent: Wednesday, May 26, 2021 2:46 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: The future of the daffodil DFDL schema debugger?

> Some thoughts re: data format debugger
> I suggest we enumerate

Mike, are you saying there is some ground work to lay for this in Daffodil
itself, or are these things which the debugger needs to model after
existing concepts.


On Mon, May 24, 2021 at 12:48 PM Beckerle, Mike <
mbeckerle@owlcyberdefense.com> wrote:

> Some thoughts re: data format debugger
>
> I suggest we enumerate
>
>   *   every single piece of state of the parser,
>   *   every single piece of state of the unparser,
>   *   each action/step of the parser,  (every parse combinator or
> primitive, their subactions)
>   *   and of the unparser, (every unparse combinator, primitive,
> suspension,...)
>
> and wire-frame/mock-up some display for each piece of state, and how, if
> changed by a step, the change to that piece of state would be displayed.
>
> We can write down the nuances associated with these data items/actions
> that impact debugger display.
>
> Some of these states/actions will be analogous to things in conventional
> debuggers. (e.g., looking at the values of variables) Others will be
> specific to DFDL needs. (e.g., looking at layers in the data stream,
> visualizing delimiter scanning success/failure, backtracking)
>
> Core concepts a debugger needs are framing vs. content vs. value, and the
> "regions" in the data stream that make these up. The framing includes
> initiators, terminators, separators, alignment regions, prefix-length
> regions, leading/trailing skip regions, unused regions. Those surround the
> content region, and when padding/filling is involved (for simple types that
> are textual) the content region contains leading pad and trailing pad
> regions, surrounding the value region.
>
> An example of graphical nested box representation of these regions is here
> in a design note about Daffodil:
>
>
> https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/
> (see section "Details of Unique and Shared Regions")
>
> The way to start this effort is to look at the UState and PState classes.
> These are the state blocks. Every piece of these is potentially important
> to the debugger.
>
> Lastly, an important aspect of Daffodil is the streaming behavior of the
> parser and unparser. While I believe it is more important to get something
> working than for it to cover every feature, this is an area where not
> anticipating how it needs to work is likely to lock one out of a future
> scenario that accomodates it.
>
> So the parser doesn't produce an infoset. It  produces a stream of infoset
> events, or call-backs to be exact.
> Due to backtracking in the parser, these events can be hung-up for
> substantial time while the parser continues. So we can't assume that there
> is any sort of correlation between parser activity and the producing of
> events.
>
> The unparser doesn't consume an infoset, It consumes a stream of infoset
> events. Specifically, the unparser is the callback-handler for unparse
> infoset events.
>
> The infoset gets trimmed so that we needn't build up the complete infoset
> tree in memory. As parse-events are produced, no-longer necessary parts of
> the infoset are pruned away. Similarly, when unparsing, once a part of the
> infoset has been unparsed, that part of the infoset tree is pruned away if
> no longer needed.
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Thursday, April 22, 2021 9:32 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> Some thoughts related to showing the infoset as if it were a variable as
> this is prototyped
>
> 1) How do DAP/IDE's represent very large hierarchical data? Infosets can
> be huge, and most of the time a user only cares about the most recent
> infoset item. So someway to follow and show just the most recent part of
> the infoset is important. The current Daffodil debugger as an
> "infosetLines" setting so that it only shows the most recent X number of
> lines, which is most all a user cares about when stepping through a parse.
>
> 2) Infoset items are added and removed very frequently during a parse.
> Currently, when the Daffodil debugger shows the infoset it just converts
> the entire thing to XML and displays that. This doesn't work at all for
> large infosets since this can take a long time. I was hoping this issue
> would get resolved with this new debugging infrastructure. When the
> infoset is modified, we ideally want a way to specify via DAP that parts
> of the variable hierarchy were added/removed rather than having to send
> the entire infoset during every variable update.
>
> 3) I can imagine a feature where a user would want to select an infoset
> item and jump to the associated schema element, or query information
> about that infoset item (e.g.. what bit position did it start at, what
> was the length). We don't have this right now, but would be really nice
> to have. This suggests that we need metadata associated with each of the
> variables. Does DAP have a concept of that and do IDE's have a way to
> show it?
>
> On 4/21/21 7:52 PM, Adam Rosien wrote:
> > I've been reading up on DAP and wanted to share...
> >
> >> There are many areas though that are unique to Daffodil that have no
> > representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> > different variable types, backtracking, etc) will need an extension to
> > DAP.  This really boils down to defining these things to fit under the
> DAP
> > BaseProtocol and enabling handling of those objects on both the front and
> > back ends.
> >
> > To me, much of the current state exposed by the (Daffodil) Debugger
> > translates directly to a DAP Variable[1]. DAP Variables can be
> > nested/hierarchical, so they could (potentially) model larger data like
> the
> > infoset. I can imagine shoving all the current state into Variables as a
> > proof-of-concept.
> >
> > It also seems like the processing stack maintained by the Daffodil
> PState,
> > where each item references the relevant schema element, could translate
> to
> > the DAP StackFrame type [2]. That is, the path from the schema root to
> the
> > currently processing schema element becomes the "call stack". (Apologies
> if
> > I don't have all the Daffodil terms lined up correctly.)
> >
> > For displaying the input data and processing progress, I looked at a few
> > existing VS Code extensions that provided non-builtin views, some of
> which
> > interact with their DAP debugger code [3] [4] [5] [6].
> >
> > Finally, I took a cursory look at scala-debug-adapter [7], which, for
> > reference, wraps Microsoft's java-debug implementation of DAP. I was
> > curious about the set of request/response and event types. Additionally,
> > the Typescript API to VS Code offers custom DAP requests and responses,
> but
> > I couldn't find the equivalent notion in the java-debug project.
> >
> > .. Adam
> >
> > [1]
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
> > [2]
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_StackFrame
> > [3] https://github.com/scalameta/metals-vscode (provides a debugger and
> > non-debugger custom UI)
> > [4] https://github.com/microsoft/vscode-cpptools (debugger + memory
> view)
> > [5]
> https://marketplace.visualstudio.com/items?itemName=marus25.cortex-debug
> > (debugger + memory view,
> >
> https://github.com/Marus/cortex-debug/blob/master/src/frontend/memory_content_provider.ts
> > )
> > [6]
> >
> https://marketplace.visualstudio.com/items?itemName=slevesque.vscode-hexdump
> > (extension for hexdumps that could be controlled by other extensions)
> > [7] https://github.com/scalacenter/scala-debug-adapter
> > [8] https://github.com/microsoft/java-debug
> >
> > On Tue, Apr 20, 2021 at 7:08 AM John Wass <jw...@gmail.com> wrote:
> >
> >>> Going to look deeper into how DAP might fit with Daffodil
> >>
> >> Have been looking over DAP and getting a good feeling about it. The
> >> specification [1] seems general enough that it could be applied to
> Daffodil
> >> and cover a swath of common operations (like start, stop, break,
> continue,
> >> code locations, variables, etc).
> >>
> >> There are many areas though that are unique to Daffodil that have no
> >> representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> >> different variable types, backtracking, etc) will need an extension to
> >> DAP.  This really boils down to defining these things to fit under the
> DAP
> >> BaseProtocol and enabling handling of those objects on both the front
> and
> >> back ends.
> >>
> >> On the backend we need a Daffodil DAP protocol server.  Existing JVM
> >> implementations (like Java [2], Scala [3]) are tied closely to JDI and
> >> would bring a lot of extra baggage to work around that.  Developing a
> >> Daffodil specific implementation is no small task, but feasible.  There
> are
> >> a several existing implementations on the JVM that are close and can be
> >> looked at for reference.
> >>
> >> The backend implementation would look similar to what was described in
> an
> >> earlier post.  We could use ZIO/Akka/etc to implement the backend
> Protocol
> >> Server to enable the IO between the Daffodil process and the DAP
> clients.
> >> This implementation would now be guided by the DAP specification.
> >>
> >> With the protocol and backend extended to fit Daffodil that leaves the
> >> frontend.  In theory an existing IDE plugin should get pretty close to
> >> being able to perform the common debug operations mentioned above.  To
> >> support the Daffodil extensions there will need to be handling of the
> >> extended protocol into whatever views are desired/applicable.
> >>
> >>> Also looking into the Java Debug Interface (JDI) for comparison.
> >>
> >> JDI appears to be the wrong level of abstraction for what we are talking
> >> about in debugging Daffodil for schema development.  While DAP does do
> JVM
> >> debugging (through a JDI DAP impl) it also generalizes to many other
> >> debugging scenarios.  JDI on the other hand is very tied to the JVM.
> >>
> >> Extending the JDI appears to be more complex than dealing with DAP, and
> >> even though the JDI API is mostly defined with interfaces, there are
> choke
> >> points that limit to JVM concepts.  For example jdi.Value has a finite
> set
> >> of JVM types that it works with, its not clear where Daffodil types
> would
> >> plugin if even possible.
> >>
> >> The final note is that unique Daffodil features wouldn’t get to IDE
> support
> >> any faster JDI.  In some cases, like VS Code, you would still need an
> >> extended DAP to support these features.
> >>
> >>> and depending on how it shakes out will update the example to show
> >> integration
> >>
> >> It would appear wise to investigate DAP further.  Next step is to refine
> >> these thoughts with a prototype. I started an implementation in the
> example
> >> debugger project [4] to try to run the current example on a _minimal_
> DAP
> >> implementation.
> >>
> >>
> >> [1] https://microsoft.github.io/debug-adapter-protocol/specification
> >> [2] https://github.com/Microsoft/java-debug
> >> [3] https://github.com/scalacenter/scala-debug-adapter
> >> [4] https://github.com/jw3/example-daffodil-debug
> >>
> >>
> >> On Mon, Apr 12, 2021 at 9:58 AM John Wass <jw...@gmail.com> wrote:
> >>
> >>>> the code is here https://github.com/jw3/example-daffodil-debug
> >>>
> >>> There is now a complete console based example for Zio that demonstrates
> >>> controlling the debug flow while distributing the current state to
> three
> >>> "displays".
> >>> 1. infoset at current step
> >>> 2. diff of infoset against previous step
> >>> 3. bit position and value of data.
> >>>
> >>> These displays are very rudimentary but demonstrate the ability to
> >>> asynchronously populate multiple views while synchronously controlling
> >> the
> >>> debug loop.
> >>>
> >>>> - The new protocol being informed by existing debugger and DAPis key
> >>>
> >>> Going to look deeper into how DAP might fit with Daffodil, and
> depending
> >>> on how it shakes out will update the example to show integration.
> >>>
> >>> Some interesting links to start with
> >>> - https://github.com/scalacenter/scala-debug-adapter
> >>> -
> >>>
> >>
> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
> >>> - https://github.com/microsoft/java-debug
> >>>
> >>> Also looking into the Java Debug Interface (JDI) for comparison.
> >>>
> >>>
> >>> On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:
> >>>
> >>>> Revisiting this post after doing some debugger related work and
> thinking
> >>>> about debug protocol/adapters to connect external tooling to the debug
> >>>> process.
> >>>>
> >>>> This comment is good
> >>>>
> >>>>> This allo makes me wonder if an approach worth taking for the future
> >> of
> >>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
> >>>> Protocol". I imagine it would be loosely based on DAP (which is
> >>>> essentially JSON message based) but could be targeted to the things
> >> that a
> >>>> DFDL schema debugger would really need. An added benefit with some
> >> sort of
> >>>> protocol is the debugger interface can be uncoupled from Daffodil
> >>>> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
> >>>> framework and just have it communicate the protocol over some form of
> >>>> IPC. Another benefit is that any future backends could implement this
> >>>> protocol and so a single debugger could hook into different backends
> >>>> without much issue. Unfortunately, defining such a protocol might be a
> >>>> large task, but we do have our existing debug infrastructure and
> things
> >>>> like DAP to guide its development/design.
> >>>>
> >>>> Some thoughts on this
> >>>> - Defining the protocol will be a large task, but a minimal version
> >>>> should get up and round tripping quickly with a minimal subset of the
> >>>> protocol.
> >>>> - The new protocol being informed by existing debugger and DAPis key
> >>>> - Uncoupling from Daffodil is key
> >>>> - Adapt the Daffodil protocol to produce DAP after the fact so as not
> to
> >>>> constrain Daffodil debugging capability
> >>>> - We dont need to tie the protocol or adapters to a single framework,
> >>>> implementations of the IO layer should be simple enough to support
> >> multiple
> >>>> things (eg Akka, Zio, "basic" ...)
> >>>> - The current debugger lives in runtime1, but can we make an abstract
> >> API
> >>>> that any runtime would implement?
> >>>>
> >>>> Maybe a solution is structured like this
> >>>> - daffodil-debug-api:
> >>>>   - protocol model
> >>>>   - interfaces: debugger / IO adapter / etc
> >>>>   - lives in daffodil repo (new subproject?)
> >>>> - daffodil-debug-io-NAME
> >>>>   - provides implementation of a specific IO adapter
> >>>>   - multiple projects possible (daffodil-debugger-akka,
> >>>> daffodil-debugger-zio, etc)
> >>>>   - supported ones live in their own subprojects, but other can be
> >>>> plugged in from external sources
> >>>>   - ability to support multiple implementations reduces risk of
> lock-in
> >>>> - debugger applications
> >>>>   - maintained in external repositories
> >>>>   - depending on the IO implementation these could execute be in
> >> separate
> >>>> process or on separate machine
> >>>>   - like Steve said, could be any language / framework
> >>>>
> >>>> Three types of reference implementations / sample applications could
> >> also
> >>>> guide the development of the API
> >>>>   1. a replacement for the existing TUI debugger, expected to end up
> >> with
> >>>> at minimum the same functionality as the current one.
> >>>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
> >>>>   3. an IDE integration
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Also I'm working on some reference implementations of these concepts
> >>>> using Akka and Zio.  Not quite ready to talk through it yet, but the
> >> code
> >>>> is here https://github.com/jw3/example-daffodil-debug
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Yep, something like that seems very reasonable for dealing with large
> >>>>> infosets. But it still feels like we still run into usability issues.
> >>>>> For example, what if a user wants to see more? We need some
> >>>>> configuration options to increase what we've ellided. It's not big,
> but
> >>>>> every new thing that needs configuration adds complexity and
> decreases
> >>>>> usability.
> >>>>>
> >>>>> And I think the only reason we are trying to spend effort elliding
> >>>>> things is because we're limited to this gdb-like interface where you
> >> can
> >>>>> only print out a little information at a time.
> >>>>>
> >>>>> I think what would really is to dump this gdb interface and instead
> use
> >>>>> multiple windows/views. As a really close example to what I imagine,
> I
> >>>>> recently came across this hex editor:
> >>>>>
> >>>>> https://www.synalysis.net/
> >>>>>
> >>>>> The screenshots are a bit small so it's not super clear, but this
> tool
> >>>>> has one view for the data in hex, and one view for a tree of parsed
> >>>>> results (which is very similar to our infoset). The "infoset" view
> has
> >>>>> information like offset/length/value, and can be related back to the
> >>>>> data view to find the actual bits.
> >>>>>
> >>>>> I imagine the "next generation daffodil debugger" to look much like
> >>>>> this. As data is parsed, the infoset view fills up. This view could
> act
> >>>>> like a standard GUI tree so you could collapse sections or scroll
> >> around
> >>>>> to show just the parts you care about, and have search capabilities
> to
> >>>>> quickly jump around. The advantage here is you no longer really need
> >>>>> automated eliding or heuristics for what the user *might* care about.
> >>>>> You just show the whole thing and let user scroll around. As daffodil
> >>>>> parses and backtracks, this tree grows or shrinks.
> >>>>>
> >>>>> I also imagine you could have a cursor moving around the hex view, so
> >> as
> >>>>> daffodil moves around (e.g. scanning for delimiters, extracting
> >>>>> integers), one could update this data view to show what daffodil is
> >>>>> doing and where it is.
> >>>>>
> >>>>> I also image there could be other views as well. For example, a
> schema
> >>>>> view to show where in the schema daffodil is, and to add/remove
> >>>>> breakpoints. And an information view for things like variables,
> >> in-scope
> >>>>> delimiters, PoU's, etc.
> >>>>>
> >>>>> The only reason I mention a debug protcol is that would allow this
> GUI
> >>>>> to be more easily written in something other that Java/Scala to take
> >>>>> advantage of other GUI toolkits. It's been a long while since I've
> done
> >>>>> anything with Java guis, but they seems pretty poor that last I
> looked
> >>>>> at them. Would even allow for a TUI, which Java has little/no support
> >>>>> for. Also enables things like remote deubgging if an socket IPC was
> >>>>> used. Though I'm not sure all of that is necessary. Just thinking
> what
> >>>>> would be ideal, and it can always be pared back.
> >>>>>
> >>>>>
> >>>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> >>>>>> I don't think of it as a daffodil debug protocol, but just a
> >>>>> separation of concerns between display of information and the
> >> behaviors of
> >>>>> parse/unparse that need to be points where users can pause, and data
> >>>>> structures available to display.
> >>>>>>
> >>>>>> E.g., it is 100% a display issue that the infoset (shown as XML) is
> >>>>> clumsy, too big, etc.  The infoset is available in the processor
> >> state, and
> >>>>> one can examine the current node, enclosing node, prior sibling(s),
> >>>>> following sibling(s), etc. One can elide contents that are too big
> for
> >>>>> hexBinary, etc.
> >>>>>>
> >>>>>> I think this problem, how to display the infoset with sensible
> limits
> >>>>> on sizing, is fairly easy to come up with some design for, that will
> at
> >>>>> least be (1) always fairly small (2) much more useful in more cases.
> It
> >>>>> won't be perfect but can be much better than what we do now.
> >>>>>>
> >>>>>> One sensible display "mode" should be that displaying the context
> >>>>> surrounding the current element (when parsing or unparsing) displays
> at
> >>>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L
> >> characters
> >>>>> (settable within reason ?)
> >>>>>>
> >>>>>> Sibling and enclosing nodes would be displayed eliding their
> contents
> >>>>> to at most 1 line.
> >>>>>>
> >>>>>> Here's an example of what I mean. Displaying up to M=10 lines total:
> >>>>>>
> >>>>>> ...
> >>>>>> <enclosingParent1>
> >>>>>>    ...
> >>>>>>    <priorSibling2>89ab782 ...</...>
> >>>>>>    <priorSibling1>some text is here and some more text</...>
> >>>>>>    <currentNode>value might be some big thing which needs to be
> >> elided
> >>>>> ...</...>
> >>>>>>    <followingSibling1> ... </...>
> >>>>>>    ???
> >>>>>> </enclosingParent1>
> >>>>>> ???
> >>>>>>
> >>>>>> The </...> is just an idea to reduce XML matching end-tag clutter.
> >>>>>>
> >>>>>> The ... on a line alone or where element content would appear
> >>>>> generally means 1 or more other siblings. The way the display above
> >> starts
> >>>>> with ... means that this is a relative inner nest, not starting from
> >> the
> >>>>> absolute root.
> >>>>>>
> >>>>>> The ... within simple content means that content is elided to fit on
> >>>>> one line. Always follows some text characters to differentiate from
> the
> >>>>> child-element context.
> >>>>>>
> >>>>>> The ??? means zero or more other siblings.
> >>>>>>
> >>>>>> I used bold italic above to point out that the current node would be
> >>>>> highlighted somehow. Probably a way to do this that doesn't require
> >> display
> >>>>> modes would be useful. E.g., a text marker like ">>>" as in:
> >>>>>>
> >>>>>>>>> <currentNode>value .... </...>
> >>>>>>
> >>>>>> might be better, particularly for a trace output being dumped to a
> >>>>> text file.
> >>>>>>
> >>>>>> I made the above example an unparser kind of example by showing a
> >>>>> following sibling that exists that is after the current node.
> >>>>>>
> >>>>>> I think the key concept is that any sibling node is displayed in a
> >> way
> >>>>> that fits on one line.
> >>>>>> E.g., even if the element name was really long, I'd suggest:
> >>>>>>
> >>>>>>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> >>>>>>
> >>>>>> Where the element name itself gets elided because it is too long.
> >>>>>>
> >>>>>> A thought. Note that the above presentation is shown as quasi-XML,
> >> but
> >>>>> there's nothing XML-specific about it. A JSON-friendly equivalent
> >> could be
> >>>>> done as well:
> >>>>>>
> >>>>>> enclosingParent1 = {
> >>>>>>    ...
> >>>>>>    priorSibling2 = "89ab782..."
> >>>>>>    priorSibling1 = "some text is here and some more text"
> >>>>>>    currentNode = "value might be some big thing which needs to be
> >>>>> elided ..."
> >>>>>>    followingSibling1 = { ... }
> >>>>>>    ???
> >>>>>> }
> >>>>>>
> >>>>>> That's enough for 1 email thread on this debug topic.
> >>>>>>
> >>>>>>
> >>>>>> ________________________________
> >>>>>> From: Steve Lawrence <sl...@apache.org>
> >>>>>> Sent: Tuesday, January 5, 2021 2:26 PM
> >>>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> >>>>>> Subject: The future of the daffodil DFDL schema debugger?
> >>>>>>
> >>>>>>
> >>>>>> Now that we're in a new year, I'd like to start a discussion about
> >> the
> >>>>>> Daffodil DFDL Schema debugger and how it might be improved to be
> more
> >>>>>> useful.
> >>>>>>
> >>>>>> Note that this is not the capabilities to debug Daffodil itself in
> >>>>>> something like Eclipse/IntelliJ, but the ability for Daffodil to
> >>>>> provide
> >>>>>> enough extra information during a parse/unparse so that a schema
> >>>>>> developer can get an idea of what Daffodil is doing. This makes it
> >>>>>> easier for users (rather than developers) to determine why a schema
> >>>>>> isn't giving the expect parse/unparse result (either because of bad
> >>>>> data
> >>>>>> or a faulty schema.
> >>>>>>
> >>>>>> The current state of the debugger is enabled by providing the
> --debug
> >>>>> or
> >>>>>> --trace flags in the CLI. More information about that here:
> >>>>>>
> >>>>>> https://daffodil.apache.org/debugger/
> >>>>>>
> >>>>>> This enables a TUI and commands somewhat similar to GDB, providing
> >>>>> thins
> >>>>>> like breakpoints, steps, displaying the current infoset, display a
> >> dump
> >>>>>> of the data, etc.
> >>>>>>
> >>>>>> Although I find this tool pretty useful, it definitely has some
> >> glaring
> >>>>>> issues.
> >>>>>>
> >>>>>> The most glaring to me is that it really isn't useful at all for
> >>>>>> debugging unparse. The data dumps only include then main
> >> outputstream,
> >>>>>> so determine things like suspensions and buffered output is
> >> impossible.
> >>>>>>
> >>>>>> Another issue is the infoset output. When outputting the infoset,
> the
> >>>>>> debugger currently just walks the entire thing and converts it to
> XML
> >>>>>> and displays the XML. For large infosets, this is excess and can
> make
> >>>>> it
> >>>>>> impossible to use, even with some configurations the limit how much
> >> of
> >>>>>> that infoset is actually printed to the screen. Also things like
> >> large
> >>>>>> hex binary blobs create excessive and unusable output.
> >>>>>>
> >>>>>> Another thing I feel is missing is a schema view. Right now it's
> very
> >>>>>> difficult to know where in the schema Daffodil actually is.
> >>>>>>
> >>>>>> I think these issues just need some thought improvement. One could
> >>>>>> imagine a better way to stringify our unparse buffers for debug. One
> >>>>>> could image a way to receive infoset state changes so the debugger
> >> can
> >>>>>> track things like backtracks and remove infosets. One could image a
> >> way
> >>>>>> display the schema
> >>>>>>
> >>>>>> We just need a better way to stringify the current state of the
> >> unparse
> >>>>>> data including buffers, and we need a way to for the debugger to
> >>>>> receive
> >>>>>> state change information about infoset so it can update displays
> >> rather
> >>>>>> than just constantly printing the entire infoset.
> >>>>>>
> >>>>>> However, I think another other big issue is just usability in
> >> general.
> >>>>> I
> >>>>>> think the CLI usage is reasonable, but it's not always user
> friendly,
> >>>>>> and is difficult to view multiple things at the same time. I think
> >>>>>> because of this very few people even use this tool. So this this
> like
> >>>>>> perhaps something worth focus.
> >>>>>>
> >>>>>> My first thought to improving this usability issue would be to
> >>>>> implement
> >>>>>> the Debug Adapter Protocol (DAP)
> >>>>>> (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
> >>>>>> which many IDE's implement. With this implemented, Daffodil could be
> >>>>>> plugged in to any IDE that supports it and essentially get debugging
> >>>>> for
> >>>>>> free, without the need to worry about the GUI elements.
> >>>>>>
> >>>>>> I do have concerns that this just wouldn't have enough functionality
> >>>>>> that we'd really need. For example, DAP really only has ability show
> >>>>>> code (Daffodil's equivalent is the DFDL schema). There isn't a way
> to
> >>>>>> show a live view of the infoset or data. Most DAP IDE's do have a
> >>>>>> console output, so we could potentially make it so the console
> output
> >>>>> is
> >>>>>> a live view of infoset/data. But I'm not even sure most DAP friendly
> >>>>>> IDE's could support this kindof console output. Does anyone have
> >>>>>> familiarity with DAP IDE's or and what kinds of console capabilities
> >>>>> are
> >>>>>> available?
> >>>>>>
> >>>>>> I also looked into TUI libraries with the idea that we could just
> >>>>> extend
> >>>>>> our current debugger user interface to be a bit friendlier.
> >>>>>> Unfortunately, there aren't too many Java/Scala TUI libraries and
> >> those
> >>>>>> that do exist don't have Apache friendly licenses. We also want to
> be
> >>>>>> careful about increase dependencies just for a debugger than many
> >>>>> people
> >>>>>> might not use, so large graphics libraries are probably out of the
> >>>>> question.
> >>>>>>
> >>>>>> This allo makes me wonder if an approach worth taking for the future
> >> of
> >>>>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
> >>>>>> Protocol". I imagine it would be loosely based on DAP (which is
> >>>>>> essentially JSON message based) but could be targeted to the things
> >>>>> that
> >>>>>> a DFDL schema debugger would really need. An added benefit with some
> >>>>>> sort of protocol is the debugger interface can be uncoupled from
> >>>>>> Daffodil itself, so we could implement a TUI/GUI/whatever in any
> >>>>>> language/GUI framework and just have it communicate the protocol
> over
> >>>>>> some form of IPC. Another benefit is that any future backends could
> >>>>>> implement this protocol and so a single debugger could hook into
> >>>>>> different backends without much issue. Unfortunately, defining such
> a
> >>>>>> protocol might be a large task, but we do have our existing debug
> >>>>>> infrastructure and things like DAP to guide its development/design.
> >>>>>>
> >>>>>> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> >> we
> >>>>>> really just need the few improvements mentioned to the existing
> >>>>>> debugger. Is that enough to make it usable? Or is an entirely
> >> different
> >>>>>> approach needed to debugging schemas?
> >>>>>>
> >>>>>
> >>>>>
> >>
> >
>
>

Re: The future of the daffodil DFDL schema debugger?

Posted by John Wass <jw...@gmail.com>.
> Some thoughts re: data format debugger
> I suggest we enumerate

Mike, are you saying there is some ground work to lay for this in Daffodil
itself, or are these things which the debugger needs to model after
existing concepts.


On Mon, May 24, 2021 at 12:48 PM Beckerle, Mike <
mbeckerle@owlcyberdefense.com> wrote:

> Some thoughts re: data format debugger
>
> I suggest we enumerate
>
>   *   every single piece of state of the parser,
>   *   every single piece of state of the unparser,
>   *   each action/step of the parser,  (every parse combinator or
> primitive, their subactions)
>   *   and of the unparser, (every unparse combinator, primitive,
> suspension,...)
>
> and wire-frame/mock-up some display for each piece of state, and how, if
> changed by a step, the change to that piece of state would be displayed.
>
> We can write down the nuances associated with these data items/actions
> that impact debugger display.
>
> Some of these states/actions will be analogous to things in conventional
> debuggers. (e.g., looking at the values of variables) Others will be
> specific to DFDL needs. (e.g., looking at layers in the data stream,
> visualizing delimiter scanning success/failure, backtracking)
>
> Core concepts a debugger needs are framing vs. content vs. value, and the
> "regions" in the data stream that make these up. The framing includes
> initiators, terminators, separators, alignment regions, prefix-length
> regions, leading/trailing skip regions, unused regions. Those surround the
> content region, and when padding/filling is involved (for simple types that
> are textual) the content region contains leading pad and trailing pad
> regions, surrounding the value region.
>
> An example of graphical nested box representation of these regions is here
> in a design note about Daffodil:
>
>
> https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/
> (see section "Details of Unique and Shared Regions")
>
> The way to start this effort is to look at the UState and PState classes.
> These are the state blocks. Every piece of these is potentially important
> to the debugger.
>
> Lastly, an important aspect of Daffodil is the streaming behavior of the
> parser and unparser. While I believe it is more important to get something
> working than for it to cover every feature, this is an area where not
> anticipating how it needs to work is likely to lock one out of a future
> scenario that accomodates it.
>
> So the parser doesn't produce an infoset. It  produces a stream of infoset
> events, or call-backs to be exact.
> Due to backtracking in the parser, these events can be hung-up for
> substantial time while the parser continues. So we can't assume that there
> is any sort of correlation between parser activity and the producing of
> events.
>
> The unparser doesn't consume an infoset, It consumes a stream of infoset
> events. Specifically, the unparser is the callback-handler for unparse
> infoset events.
>
> The infoset gets trimmed so that we needn't build up the complete infoset
> tree in memory. As parse-events are produced, no-longer necessary parts of
> the infoset are pruned away. Similarly, when unparsing, once a part of the
> infoset has been unparsed, that part of the infoset tree is pruned away if
> no longer needed.
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Thursday, April 22, 2021 9:32 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> Some thoughts related to showing the infoset as if it were a variable as
> this is prototyped
>
> 1) How do DAP/IDE's represent very large hierarchical data? Infosets can
> be huge, and most of the time a user only cares about the most recent
> infoset item. So someway to follow and show just the most recent part of
> the infoset is important. The current Daffodil debugger as an
> "infosetLines" setting so that it only shows the most recent X number of
> lines, which is most all a user cares about when stepping through a parse.
>
> 2) Infoset items are added and removed very frequently during a parse.
> Currently, when the Daffodil debugger shows the infoset it just converts
> the entire thing to XML and displays that. This doesn't work at all for
> large infosets since this can take a long time. I was hoping this issue
> would get resolved with this new debugging infrastructure. When the
> infoset is modified, we ideally want a way to specify via DAP that parts
> of the variable hierarchy were added/removed rather than having to send
> the entire infoset during every variable update.
>
> 3) I can imagine a feature where a user would want to select an infoset
> item and jump to the associated schema element, or query information
> about that infoset item (e.g.. what bit position did it start at, what
> was the length). We don't have this right now, but would be really nice
> to have. This suggests that we need metadata associated with each of the
> variables. Does DAP have a concept of that and do IDE's have a way to
> show it?
>
> On 4/21/21 7:52 PM, Adam Rosien wrote:
> > I've been reading up on DAP and wanted to share...
> >
> >> There are many areas though that are unique to Daffodil that have no
> > representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> > different variable types, backtracking, etc) will need an extension to
> > DAP.  This really boils down to defining these things to fit under the
> DAP
> > BaseProtocol and enabling handling of those objects on both the front and
> > back ends.
> >
> > To me, much of the current state exposed by the (Daffodil) Debugger
> > translates directly to a DAP Variable[1]. DAP Variables can be
> > nested/hierarchical, so they could (potentially) model larger data like
> the
> > infoset. I can imagine shoving all the current state into Variables as a
> > proof-of-concept.
> >
> > It also seems like the processing stack maintained by the Daffodil
> PState,
> > where each item references the relevant schema element, could translate
> to
> > the DAP StackFrame type [2]. That is, the path from the schema root to
> the
> > currently processing schema element becomes the "call stack". (Apologies
> if
> > I don't have all the Daffodil terms lined up correctly.)
> >
> > For displaying the input data and processing progress, I looked at a few
> > existing VS Code extensions that provided non-builtin views, some of
> which
> > interact with their DAP debugger code [3] [4] [5] [6].
> >
> > Finally, I took a cursory look at scala-debug-adapter [7], which, for
> > reference, wraps Microsoft's java-debug implementation of DAP. I was
> > curious about the set of request/response and event types. Additionally,
> > the Typescript API to VS Code offers custom DAP requests and responses,
> but
> > I couldn't find the equivalent notion in the java-debug project.
> >
> > .. Adam
> >
> > [1]
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
> > [2]
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_StackFrame
> > [3] https://github.com/scalameta/metals-vscode (provides a debugger and
> > non-debugger custom UI)
> > [4] https://github.com/microsoft/vscode-cpptools (debugger + memory
> view)
> > [5]
> https://marketplace.visualstudio.com/items?itemName=marus25.cortex-debug
> > (debugger + memory view,
> >
> https://github.com/Marus/cortex-debug/blob/master/src/frontend/memory_content_provider.ts
> > )
> > [6]
> >
> https://marketplace.visualstudio.com/items?itemName=slevesque.vscode-hexdump
> > (extension for hexdumps that could be controlled by other extensions)
> > [7] https://github.com/scalacenter/scala-debug-adapter
> > [8] https://github.com/microsoft/java-debug
> >
> > On Tue, Apr 20, 2021 at 7:08 AM John Wass <jw...@gmail.com> wrote:
> >
> >>> Going to look deeper into how DAP might fit with Daffodil
> >>
> >> Have been looking over DAP and getting a good feeling about it. The
> >> specification [1] seems general enough that it could be applied to
> Daffodil
> >> and cover a swath of common operations (like start, stop, break,
> continue,
> >> code locations, variables, etc).
> >>
> >> There are many areas though that are unique to Daffodil that have no
> >> representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> >> different variable types, backtracking, etc) will need an extension to
> >> DAP.  This really boils down to defining these things to fit under the
> DAP
> >> BaseProtocol and enabling handling of those objects on both the front
> and
> >> back ends.
> >>
> >> On the backend we need a Daffodil DAP protocol server.  Existing JVM
> >> implementations (like Java [2], Scala [3]) are tied closely to JDI and
> >> would bring a lot of extra baggage to work around that.  Developing a
> >> Daffodil specific implementation is no small task, but feasible.  There
> are
> >> a several existing implementations on the JVM that are close and can be
> >> looked at for reference.
> >>
> >> The backend implementation would look similar to what was described in
> an
> >> earlier post.  We could use ZIO/Akka/etc to implement the backend
> Protocol
> >> Server to enable the IO between the Daffodil process and the DAP
> clients.
> >> This implementation would now be guided by the DAP specification.
> >>
> >> With the protocol and backend extended to fit Daffodil that leaves the
> >> frontend.  In theory an existing IDE plugin should get pretty close to
> >> being able to perform the common debug operations mentioned above.  To
> >> support the Daffodil extensions there will need to be handling of the
> >> extended protocol into whatever views are desired/applicable.
> >>
> >>> Also looking into the Java Debug Interface (JDI) for comparison.
> >>
> >> JDI appears to be the wrong level of abstraction for what we are talking
> >> about in debugging Daffodil for schema development.  While DAP does do
> JVM
> >> debugging (through a JDI DAP impl) it also generalizes to many other
> >> debugging scenarios.  JDI on the other hand is very tied to the JVM.
> >>
> >> Extending the JDI appears to be more complex than dealing with DAP, and
> >> even though the JDI API is mostly defined with interfaces, there are
> choke
> >> points that limit to JVM concepts.  For example jdi.Value has a finite
> set
> >> of JVM types that it works with, its not clear where Daffodil types
> would
> >> plugin if even possible.
> >>
> >> The final note is that unique Daffodil features wouldn’t get to IDE
> support
> >> any faster JDI.  In some cases, like VS Code, you would still need an
> >> extended DAP to support these features.
> >>
> >>> and depending on how it shakes out will update the example to show
> >> integration
> >>
> >> It would appear wise to investigate DAP further.  Next step is to refine
> >> these thoughts with a prototype. I started an implementation in the
> example
> >> debugger project [4] to try to run the current example on a _minimal_
> DAP
> >> implementation.
> >>
> >>
> >> [1] https://microsoft.github.io/debug-adapter-protocol/specification
> >> [2] https://github.com/Microsoft/java-debug
> >> [3] https://github.com/scalacenter/scala-debug-adapter
> >> [4] https://github.com/jw3/example-daffodil-debug
> >>
> >>
> >> On Mon, Apr 12, 2021 at 9:58 AM John Wass <jw...@gmail.com> wrote:
> >>
> >>>> the code is here https://github.com/jw3/example-daffodil-debug
> >>>
> >>> There is now a complete console based example for Zio that demonstrates
> >>> controlling the debug flow while distributing the current state to
> three
> >>> "displays".
> >>> 1. infoset at current step
> >>> 2. diff of infoset against previous step
> >>> 3. bit position and value of data.
> >>>
> >>> These displays are very rudimentary but demonstrate the ability to
> >>> asynchronously populate multiple views while synchronously controlling
> >> the
> >>> debug loop.
> >>>
> >>>> - The new protocol being informed by existing debugger and DAPis key
> >>>
> >>> Going to look deeper into how DAP might fit with Daffodil, and
> depending
> >>> on how it shakes out will update the example to show integration.
> >>>
> >>> Some interesting links to start with
> >>> - https://github.com/scalacenter/scala-debug-adapter
> >>> -
> >>>
> >>
> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
> >>> - https://github.com/microsoft/java-debug
> >>>
> >>> Also looking into the Java Debug Interface (JDI) for comparison.
> >>>
> >>>
> >>> On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:
> >>>
> >>>> Revisiting this post after doing some debugger related work and
> thinking
> >>>> about debug protocol/adapters to connect external tooling to the debug
> >>>> process.
> >>>>
> >>>> This comment is good
> >>>>
> >>>>> This allo makes me wonder if an approach worth taking for the future
> >> of
> >>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
> >>>> Protocol". I imagine it would be loosely based on DAP (which is
> >>>> essentially JSON message based) but could be targeted to the things
> >> that a
> >>>> DFDL schema debugger would really need. An added benefit with some
> >> sort of
> >>>> protocol is the debugger interface can be uncoupled from Daffodil
> >>>> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
> >>>> framework and just have it communicate the protocol over some form of
> >>>> IPC. Another benefit is that any future backends could implement this
> >>>> protocol and so a single debugger could hook into different backends
> >>>> without much issue. Unfortunately, defining such a protocol might be a
> >>>> large task, but we do have our existing debug infrastructure and
> things
> >>>> like DAP to guide its development/design.
> >>>>
> >>>> Some thoughts on this
> >>>> - Defining the protocol will be a large task, but a minimal version
> >>>> should get up and round tripping quickly with a minimal subset of the
> >>>> protocol.
> >>>> - The new protocol being informed by existing debugger and DAPis key
> >>>> - Uncoupling from Daffodil is key
> >>>> - Adapt the Daffodil protocol to produce DAP after the fact so as not
> to
> >>>> constrain Daffodil debugging capability
> >>>> - We dont need to tie the protocol or adapters to a single framework,
> >>>> implementations of the IO layer should be simple enough to support
> >> multiple
> >>>> things (eg Akka, Zio, "basic" ...)
> >>>> - The current debugger lives in runtime1, but can we make an abstract
> >> API
> >>>> that any runtime would implement?
> >>>>
> >>>> Maybe a solution is structured like this
> >>>> - daffodil-debug-api:
> >>>>   - protocol model
> >>>>   - interfaces: debugger / IO adapter / etc
> >>>>   - lives in daffodil repo (new subproject?)
> >>>> - daffodil-debug-io-NAME
> >>>>   - provides implementation of a specific IO adapter
> >>>>   - multiple projects possible (daffodil-debugger-akka,
> >>>> daffodil-debugger-zio, etc)
> >>>>   - supported ones live in their own subprojects, but other can be
> >>>> plugged in from external sources
> >>>>   - ability to support multiple implementations reduces risk of
> lock-in
> >>>> - debugger applications
> >>>>   - maintained in external repositories
> >>>>   - depending on the IO implementation these could execute be in
> >> separate
> >>>> process or on separate machine
> >>>>   - like Steve said, could be any language / framework
> >>>>
> >>>> Three types of reference implementations / sample applications could
> >> also
> >>>> guide the development of the API
> >>>>   1. a replacement for the existing TUI debugger, expected to end up
> >> with
> >>>> at minimum the same functionality as the current one.
> >>>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
> >>>>   3. an IDE integration
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Also I'm working on some reference implementations of these concepts
> >>>> using Akka and Zio.  Not quite ready to talk through it yet, but the
> >> code
> >>>> is here https://github.com/jw3/example-daffodil-debug
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Yep, something like that seems very reasonable for dealing with large
> >>>>> infosets. But it still feels like we still run into usability issues.
> >>>>> For example, what if a user wants to see more? We need some
> >>>>> configuration options to increase what we've ellided. It's not big,
> but
> >>>>> every new thing that needs configuration adds complexity and
> decreases
> >>>>> usability.
> >>>>>
> >>>>> And I think the only reason we are trying to spend effort elliding
> >>>>> things is because we're limited to this gdb-like interface where you
> >> can
> >>>>> only print out a little information at a time.
> >>>>>
> >>>>> I think what would really is to dump this gdb interface and instead
> use
> >>>>> multiple windows/views. As a really close example to what I imagine,
> I
> >>>>> recently came across this hex editor:
> >>>>>
> >>>>> https://www.synalysis.net/
> >>>>>
> >>>>> The screenshots are a bit small so it's not super clear, but this
> tool
> >>>>> has one view for the data in hex, and one view for a tree of parsed
> >>>>> results (which is very similar to our infoset). The "infoset" view
> has
> >>>>> information like offset/length/value, and can be related back to the
> >>>>> data view to find the actual bits.
> >>>>>
> >>>>> I imagine the "next generation daffodil debugger" to look much like
> >>>>> this. As data is parsed, the infoset view fills up. This view could
> act
> >>>>> like a standard GUI tree so you could collapse sections or scroll
> >> around
> >>>>> to show just the parts you care about, and have search capabilities
> to
> >>>>> quickly jump around. The advantage here is you no longer really need
> >>>>> automated eliding or heuristics for what the user *might* care about.
> >>>>> You just show the whole thing and let user scroll around. As daffodil
> >>>>> parses and backtracks, this tree grows or shrinks.
> >>>>>
> >>>>> I also imagine you could have a cursor moving around the hex view, so
> >> as
> >>>>> daffodil moves around (e.g. scanning for delimiters, extracting
> >>>>> integers), one could update this data view to show what daffodil is
> >>>>> doing and where it is.
> >>>>>
> >>>>> I also image there could be other views as well. For example, a
> schema
> >>>>> view to show where in the schema daffodil is, and to add/remove
> >>>>> breakpoints. And an information view for things like variables,
> >> in-scope
> >>>>> delimiters, PoU's, etc.
> >>>>>
> >>>>> The only reason I mention a debug protcol is that would allow this
> GUI
> >>>>> to be more easily written in something other that Java/Scala to take
> >>>>> advantage of other GUI toolkits. It's been a long while since I've
> done
> >>>>> anything with Java guis, but they seems pretty poor that last I
> looked
> >>>>> at them. Would even allow for a TUI, which Java has little/no support
> >>>>> for. Also enables things like remote deubgging if an socket IPC was
> >>>>> used. Though I'm not sure all of that is necessary. Just thinking
> what
> >>>>> would be ideal, and it can always be pared back.
> >>>>>
> >>>>>
> >>>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> >>>>>> I don't think of it as a daffodil debug protocol, but just a
> >>>>> separation of concerns between display of information and the
> >> behaviors of
> >>>>> parse/unparse that need to be points where users can pause, and data
> >>>>> structures available to display.
> >>>>>>
> >>>>>> E.g., it is 100% a display issue that the infoset (shown as XML) is
> >>>>> clumsy, too big, etc.  The infoset is available in the processor
> >> state, and
> >>>>> one can examine the current node, enclosing node, prior sibling(s),
> >>>>> following sibling(s), etc. One can elide contents that are too big
> for
> >>>>> hexBinary, etc.
> >>>>>>
> >>>>>> I think this problem, how to display the infoset with sensible
> limits
> >>>>> on sizing, is fairly easy to come up with some design for, that will
> at
> >>>>> least be (1) always fairly small (2) much more useful in more cases.
> It
> >>>>> won't be perfect but can be much better than what we do now.
> >>>>>>
> >>>>>> One sensible display "mode" should be that displaying the context
> >>>>> surrounding the current element (when parsing or unparsing) displays
> at
> >>>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L
> >> characters
> >>>>> (settable within reason ?)
> >>>>>>
> >>>>>> Sibling and enclosing nodes would be displayed eliding their
> contents
> >>>>> to at most 1 line.
> >>>>>>
> >>>>>> Here's an example of what I mean. Displaying up to M=10 lines total:
> >>>>>>
> >>>>>> ...
> >>>>>> <enclosingParent1>
> >>>>>>    ...
> >>>>>>    <priorSibling2>89ab782 ...</...>
> >>>>>>    <priorSibling1>some text is here and some more text</...>
> >>>>>>    <currentNode>value might be some big thing which needs to be
> >> elided
> >>>>> ...</...>
> >>>>>>    <followingSibling1> ... </...>
> >>>>>>    ???
> >>>>>> </enclosingParent1>
> >>>>>> ???
> >>>>>>
> >>>>>> The </...> is just an idea to reduce XML matching end-tag clutter.
> >>>>>>
> >>>>>> The ... on a line alone or where element content would appear
> >>>>> generally means 1 or more other siblings. The way the display above
> >> starts
> >>>>> with ... means that this is a relative inner nest, not starting from
> >> the
> >>>>> absolute root.
> >>>>>>
> >>>>>> The ... within simple content means that content is elided to fit on
> >>>>> one line. Always follows some text characters to differentiate from
> the
> >>>>> child-element context.
> >>>>>>
> >>>>>> The ??? means zero or more other siblings.
> >>>>>>
> >>>>>> I used bold italic above to point out that the current node would be
> >>>>> highlighted somehow. Probably a way to do this that doesn't require
> >> display
> >>>>> modes would be useful. E.g., a text marker like ">>>" as in:
> >>>>>>
> >>>>>>>>> <currentNode>value .... </...>
> >>>>>>
> >>>>>> might be better, particularly for a trace output being dumped to a
> >>>>> text file.
> >>>>>>
> >>>>>> I made the above example an unparser kind of example by showing a
> >>>>> following sibling that exists that is after the current node.
> >>>>>>
> >>>>>> I think the key concept is that any sibling node is displayed in a
> >> way
> >>>>> that fits on one line.
> >>>>>> E.g., even if the element name was really long, I'd suggest:
> >>>>>>
> >>>>>>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> >>>>>>
> >>>>>> Where the element name itself gets elided because it is too long.
> >>>>>>
> >>>>>> A thought. Note that the above presentation is shown as quasi-XML,
> >> but
> >>>>> there's nothing XML-specific about it. A JSON-friendly equivalent
> >> could be
> >>>>> done as well:
> >>>>>>
> >>>>>> enclosingParent1 = {
> >>>>>>    ...
> >>>>>>    priorSibling2 = "89ab782..."
> >>>>>>    priorSibling1 = "some text is here and some more text"
> >>>>>>    currentNode = "value might be some big thing which needs to be
> >>>>> elided ..."
> >>>>>>    followingSibling1 = { ... }
> >>>>>>    ???
> >>>>>> }
> >>>>>>
> >>>>>> That's enough for 1 email thread on this debug topic.
> >>>>>>
> >>>>>>
> >>>>>> ________________________________
> >>>>>> From: Steve Lawrence <sl...@apache.org>
> >>>>>> Sent: Tuesday, January 5, 2021 2:26 PM
> >>>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> >>>>>> Subject: The future of the daffodil DFDL schema debugger?
> >>>>>>
> >>>>>>
> >>>>>> Now that we're in a new year, I'd like to start a discussion about
> >> the
> >>>>>> Daffodil DFDL Schema debugger and how it might be improved to be
> more
> >>>>>> useful.
> >>>>>>
> >>>>>> Note that this is not the capabilities to debug Daffodil itself in
> >>>>>> something like Eclipse/IntelliJ, but the ability for Daffodil to
> >>>>> provide
> >>>>>> enough extra information during a parse/unparse so that a schema
> >>>>>> developer can get an idea of what Daffodil is doing. This makes it
> >>>>>> easier for users (rather than developers) to determine why a schema
> >>>>>> isn't giving the expect parse/unparse result (either because of bad
> >>>>> data
> >>>>>> or a faulty schema.
> >>>>>>
> >>>>>> The current state of the debugger is enabled by providing the
> --debug
> >>>>> or
> >>>>>> --trace flags in the CLI. More information about that here:
> >>>>>>
> >>>>>> https://daffodil.apache.org/debugger/
> >>>>>>
> >>>>>> This enables a TUI and commands somewhat similar to GDB, providing
> >>>>> thins
> >>>>>> like breakpoints, steps, displaying the current infoset, display a
> >> dump
> >>>>>> of the data, etc.
> >>>>>>
> >>>>>> Although I find this tool pretty useful, it definitely has some
> >> glaring
> >>>>>> issues.
> >>>>>>
> >>>>>> The most glaring to me is that it really isn't useful at all for
> >>>>>> debugging unparse. The data dumps only include then main
> >> outputstream,
> >>>>>> so determine things like suspensions and buffered output is
> >> impossible.
> >>>>>>
> >>>>>> Another issue is the infoset output. When outputting the infoset,
> the
> >>>>>> debugger currently just walks the entire thing and converts it to
> XML
> >>>>>> and displays the XML. For large infosets, this is excess and can
> make
> >>>>> it
> >>>>>> impossible to use, even with some configurations the limit how much
> >> of
> >>>>>> that infoset is actually printed to the screen. Also things like
> >> large
> >>>>>> hex binary blobs create excessive and unusable output.
> >>>>>>
> >>>>>> Another thing I feel is missing is a schema view. Right now it's
> very
> >>>>>> difficult to know where in the schema Daffodil actually is.
> >>>>>>
> >>>>>> I think these issues just need some thought improvement. One could
> >>>>>> imagine a better way to stringify our unparse buffers for debug. One
> >>>>>> could image a way to receive infoset state changes so the debugger
> >> can
> >>>>>> track things like backtracks and remove infosets. One could image a
> >> way
> >>>>>> display the schema
> >>>>>>
> >>>>>> We just need a better way to stringify the current state of the
> >> unparse
> >>>>>> data including buffers, and we need a way to for the debugger to
> >>>>> receive
> >>>>>> state change information about infoset so it can update displays
> >> rather
> >>>>>> than just constantly printing the entire infoset.
> >>>>>>
> >>>>>> However, I think another other big issue is just usability in
> >> general.
> >>>>> I
> >>>>>> think the CLI usage is reasonable, but it's not always user
> friendly,
> >>>>>> and is difficult to view multiple things at the same time. I think
> >>>>>> because of this very few people even use this tool. So this this
> like
> >>>>>> perhaps something worth focus.
> >>>>>>
> >>>>>> My first thought to improving this usability issue would be to
> >>>>> implement
> >>>>>> the Debug Adapter Protocol (DAP)
> >>>>>> (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
> >>>>>> which many IDE's implement. With this implemented, Daffodil could be
> >>>>>> plugged in to any IDE that supports it and essentially get debugging
> >>>>> for
> >>>>>> free, without the need to worry about the GUI elements.
> >>>>>>
> >>>>>> I do have concerns that this just wouldn't have enough functionality
> >>>>>> that we'd really need. For example, DAP really only has ability show
> >>>>>> code (Daffodil's equivalent is the DFDL schema). There isn't a way
> to
> >>>>>> show a live view of the infoset or data. Most DAP IDE's do have a
> >>>>>> console output, so we could potentially make it so the console
> output
> >>>>> is
> >>>>>> a live view of infoset/data. But I'm not even sure most DAP friendly
> >>>>>> IDE's could support this kindof console output. Does anyone have
> >>>>>> familiarity with DAP IDE's or and what kinds of console capabilities
> >>>>> are
> >>>>>> available?
> >>>>>>
> >>>>>> I also looked into TUI libraries with the idea that we could just
> >>>>> extend
> >>>>>> our current debugger user interface to be a bit friendlier.
> >>>>>> Unfortunately, there aren't too many Java/Scala TUI libraries and
> >> those
> >>>>>> that do exist don't have Apache friendly licenses. We also want to
> be
> >>>>>> careful about increase dependencies just for a debugger than many
> >>>>> people
> >>>>>> might not use, so large graphics libraries are probably out of the
> >>>>> question.
> >>>>>>
> >>>>>> This allo makes me wonder if an approach worth taking for the future
> >> of
> >>>>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
> >>>>>> Protocol". I imagine it would be loosely based on DAP (which is
> >>>>>> essentially JSON message based) but could be targeted to the things
> >>>>> that
> >>>>>> a DFDL schema debugger would really need. An added benefit with some
> >>>>>> sort of protocol is the debugger interface can be uncoupled from
> >>>>>> Daffodil itself, so we could implement a TUI/GUI/whatever in any
> >>>>>> language/GUI framework and just have it communicate the protocol
> over
> >>>>>> some form of IPC. Another benefit is that any future backends could
> >>>>>> implement this protocol and so a single debugger could hook into
> >>>>>> different backends without much issue. Unfortunately, defining such
> a
> >>>>>> protocol might be a large task, but we do have our existing debug
> >>>>>> infrastructure and things like DAP to guide its development/design.
> >>>>>>
> >>>>>> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> >> we
> >>>>>> really just need the few improvements mentioned to the existing
> >>>>>> debugger. Is that enough to make it usable? Or is an entirely
> >> different
> >>>>>> approach needed to debugging schemas?
> >>>>>>
> >>>>>
> >>>>>
> >>
> >
>
>

Re: The future of the daffodil DFDL schema debugger?

Posted by Adam Rosien <ad...@rosien.net>.
Your message is extremely helpful! I'll spend some time working through it
and follow up.

On Mon, May 24, 2021 at 9:48 AM Beckerle, Mike <
mbeckerle@owlcyberdefense.com> wrote:

> Some thoughts re: data format debugger
>
> I suggest we enumerate
>
>   *   every single piece of state of the parser,
>   *   every single piece of state of the unparser,
>   *   each action/step of the parser,  (every parse combinator or
> primitive, their subactions)
>   *   and of the unparser, (every unparse combinator, primitive,
> suspension,...)
>
> and wire-frame/mock-up some display for each piece of state, and how, if
> changed by a step, the change to that piece of state would be displayed.
>
> We can write down the nuances associated with these data items/actions
> that impact debugger display.
>
> Some of these states/actions will be analogous to things in conventional
> debuggers. (e.g., looking at the values of variables) Others will be
> specific to DFDL needs. (e.g., looking at layers in the data stream,
> visualizing delimiter scanning success/failure, backtracking)
>
> Core concepts a debugger needs are framing vs. content vs. value, and the
> "regions" in the data stream that make these up. The framing includes
> initiators, terminators, separators, alignment regions, prefix-length
> regions, leading/trailing skip regions, unused regions. Those surround the
> content region, and when padding/filling is involved (for simple types that
> are textual) the content region contains leading pad and trailing pad
> regions, surrounding the value region.
>
> An example of graphical nested box representation of these regions is here
> in a design note about Daffodil:
>
>
> https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/
> (see section "Details of Unique and Shared Regions")
>
> The way to start this effort is to look at the UState and PState classes.
> These are the state blocks. Every piece of these is potentially important
> to the debugger.
>
> Lastly, an important aspect of Daffodil is the streaming behavior of the
> parser and unparser. While I believe it is more important to get something
> working than for it to cover every feature, this is an area where not
> anticipating how it needs to work is likely to lock one out of a future
> scenario that accomodates it.
>
> So the parser doesn't produce an infoset. It  produces a stream of infoset
> events, or call-backs to be exact.
> Due to backtracking in the parser, these events can be hung-up for
> substantial time while the parser continues. So we can't assume that there
> is any sort of correlation between parser activity and the producing of
> events.
>
> The unparser doesn't consume an infoset, It consumes a stream of infoset
> events. Specifically, the unparser is the callback-handler for unparse
> infoset events.
>
> The infoset gets trimmed so that we needn't build up the complete infoset
> tree in memory. As parse-events are produced, no-longer necessary parts of
> the infoset are pruned away. Similarly, when unparsing, once a part of the
> infoset has been unparsed, that part of the infoset tree is pruned away if
> no longer needed.
>
>
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Thursday, April 22, 2021 9:32 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> Some thoughts related to showing the infoset as if it were a variable as
> this is prototyped
>
> 1) How do DAP/IDE's represent very large hierarchical data? Infosets can
> be huge, and most of the time a user only cares about the most recent
> infoset item. So someway to follow and show just the most recent part of
> the infoset is important. The current Daffodil debugger as an
> "infosetLines" setting so that it only shows the most recent X number of
> lines, which is most all a user cares about when stepping through a parse.
>
> 2) Infoset items are added and removed very frequently during a parse.
> Currently, when the Daffodil debugger shows the infoset it just converts
> the entire thing to XML and displays that. This doesn't work at all for
> large infosets since this can take a long time. I was hoping this issue
> would get resolved with this new debugging infrastructure. When the
> infoset is modified, we ideally want a way to specify via DAP that parts
> of the variable hierarchy were added/removed rather than having to send
> the entire infoset during every variable update.
>
> 3) I can imagine a feature where a user would want to select an infoset
> item and jump to the associated schema element, or query information
> about that infoset item (e.g.. what bit position did it start at, what
> was the length). We don't have this right now, but would be really nice
> to have. This suggests that we need metadata associated with each of the
> variables. Does DAP have a concept of that and do IDE's have a way to
> show it?
>
> On 4/21/21 7:52 PM, Adam Rosien wrote:
> > I've been reading up on DAP and wanted to share...
> >
> >> There are many areas though that are unique to Daffodil that have no
> > representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> > different variable types, backtracking, etc) will need an extension to
> > DAP.  This really boils down to defining these things to fit under the
> DAP
> > BaseProtocol and enabling handling of those objects on both the front and
> > back ends.
> >
> > To me, much of the current state exposed by the (Daffodil) Debugger
> > translates directly to a DAP Variable[1]. DAP Variables can be
> > nested/hierarchical, so they could (potentially) model larger data like
> the
> > infoset. I can imagine shoving all the current state into Variables as a
> > proof-of-concept.
> >
> > It also seems like the processing stack maintained by the Daffodil
> PState,
> > where each item references the relevant schema element, could translate
> to
> > the DAP StackFrame type [2]. That is, the path from the schema root to
> the
> > currently processing schema element becomes the "call stack". (Apologies
> if
> > I don't have all the Daffodil terms lined up correctly.)
> >
> > For displaying the input data and processing progress, I looked at a few
> > existing VS Code extensions that provided non-builtin views, some of
> which
> > interact with their DAP debugger code [3] [4] [5] [6].
> >
> > Finally, I took a cursory look at scala-debug-adapter [7], which, for
> > reference, wraps Microsoft's java-debug implementation of DAP. I was
> > curious about the set of request/response and event types. Additionally,
> > the Typescript API to VS Code offers custom DAP requests and responses,
> but
> > I couldn't find the equivalent notion in the java-debug project.
> >
> > .. Adam
> >
> > [1]
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
> > [2]
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_StackFrame
> > [3] https://github.com/scalameta/metals-vscode (provides a debugger and
> > non-debugger custom UI)
> > [4] https://github.com/microsoft/vscode-cpptools (debugger + memory
> view)
> > [5]
> https://marketplace.visualstudio.com/items?itemName=marus25.cortex-debug
> > (debugger + memory view,
> >
> https://github.com/Marus/cortex-debug/blob/master/src/frontend/memory_content_provider.ts
> > )
> > [6]
> >
> https://marketplace.visualstudio.com/items?itemName=slevesque.vscode-hexdump
> > (extension for hexdumps that could be controlled by other extensions)
> > [7] https://github.com/scalacenter/scala-debug-adapter
> > [8] https://github.com/microsoft/java-debug
> >
> > On Tue, Apr 20, 2021 at 7:08 AM John Wass <jw...@gmail.com> wrote:
> >
> >>> Going to look deeper into how DAP might fit with Daffodil
> >>
> >> Have been looking over DAP and getting a good feeling about it. The
> >> specification [1] seems general enough that it could be applied to
> Daffodil
> >> and cover a swath of common operations (like start, stop, break,
> continue,
> >> code locations, variables, etc).
> >>
> >> There are many areas though that are unique to Daffodil that have no
> >> representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> >> different variable types, backtracking, etc) will need an extension to
> >> DAP.  This really boils down to defining these things to fit under the
> DAP
> >> BaseProtocol and enabling handling of those objects on both the front
> and
> >> back ends.
> >>
> >> On the backend we need a Daffodil DAP protocol server.  Existing JVM
> >> implementations (like Java [2], Scala [3]) are tied closely to JDI and
> >> would bring a lot of extra baggage to work around that.  Developing a
> >> Daffodil specific implementation is no small task, but feasible.  There
> are
> >> a several existing implementations on the JVM that are close and can be
> >> looked at for reference.
> >>
> >> The backend implementation would look similar to what was described in
> an
> >> earlier post.  We could use ZIO/Akka/etc to implement the backend
> Protocol
> >> Server to enable the IO between the Daffodil process and the DAP
> clients.
> >> This implementation would now be guided by the DAP specification.
> >>
> >> With the protocol and backend extended to fit Daffodil that leaves the
> >> frontend.  In theory an existing IDE plugin should get pretty close to
> >> being able to perform the common debug operations mentioned above.  To
> >> support the Daffodil extensions there will need to be handling of the
> >> extended protocol into whatever views are desired/applicable.
> >>
> >>> Also looking into the Java Debug Interface (JDI) for comparison.
> >>
> >> JDI appears to be the wrong level of abstraction for what we are talking
> >> about in debugging Daffodil for schema development.  While DAP does do
> JVM
> >> debugging (through a JDI DAP impl) it also generalizes to many other
> >> debugging scenarios.  JDI on the other hand is very tied to the JVM.
> >>
> >> Extending the JDI appears to be more complex than dealing with DAP, and
> >> even though the JDI API is mostly defined with interfaces, there are
> choke
> >> points that limit to JVM concepts.  For example jdi.Value has a finite
> set
> >> of JVM types that it works with, its not clear where Daffodil types
> would
> >> plugin if even possible.
> >>
> >> The final note is that unique Daffodil features wouldn’t get to IDE
> support
> >> any faster JDI.  In some cases, like VS Code, you would still need an
> >> extended DAP to support these features.
> >>
> >>> and depending on how it shakes out will update the example to show
> >> integration
> >>
> >> It would appear wise to investigate DAP further.  Next step is to refine
> >> these thoughts with a prototype. I started an implementation in the
> example
> >> debugger project [4] to try to run the current example on a _minimal_
> DAP
> >> implementation.
> >>
> >>
> >> [1] https://microsoft.github.io/debug-adapter-protocol/specification
> >> [2] https://github.com/Microsoft/java-debug
> >> [3] https://github.com/scalacenter/scala-debug-adapter
> >> [4] https://github.com/jw3/example-daffodil-debug
> >>
> >>
> >> On Mon, Apr 12, 2021 at 9:58 AM John Wass <jw...@gmail.com> wrote:
> >>
> >>>> the code is here https://github.com/jw3/example-daffodil-debug
> >>>
> >>> There is now a complete console based example for Zio that demonstrates
> >>> controlling the debug flow while distributing the current state to
> three
> >>> "displays".
> >>> 1. infoset at current step
> >>> 2. diff of infoset against previous step
> >>> 3. bit position and value of data.
> >>>
> >>> These displays are very rudimentary but demonstrate the ability to
> >>> asynchronously populate multiple views while synchronously controlling
> >> the
> >>> debug loop.
> >>>
> >>>> - The new protocol being informed by existing debugger and DAPis key
> >>>
> >>> Going to look deeper into how DAP might fit with Daffodil, and
> depending
> >>> on how it shakes out will update the example to show integration.
> >>>
> >>> Some interesting links to start with
> >>> - https://github.com/scalacenter/scala-debug-adapter
> >>> -
> >>>
> >>
> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
> >>> - https://github.com/microsoft/java-debug
> >>>
> >>> Also looking into the Java Debug Interface (JDI) for comparison.
> >>>
> >>>
> >>> On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:
> >>>
> >>>> Revisiting this post after doing some debugger related work and
> thinking
> >>>> about debug protocol/adapters to connect external tooling to the debug
> >>>> process.
> >>>>
> >>>> This comment is good
> >>>>
> >>>>> This allo makes me wonder if an approach worth taking for the future
> >> of
> >>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
> >>>> Protocol". I imagine it would be loosely based on DAP (which is
> >>>> essentially JSON message based) but could be targeted to the things
> >> that a
> >>>> DFDL schema debugger would really need. An added benefit with some
> >> sort of
> >>>> protocol is the debugger interface can be uncoupled from Daffodil
> >>>> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
> >>>> framework and just have it communicate the protocol over some form of
> >>>> IPC. Another benefit is that any future backends could implement this
> >>>> protocol and so a single debugger could hook into different backends
> >>>> without much issue. Unfortunately, defining such a protocol might be a
> >>>> large task, but we do have our existing debug infrastructure and
> things
> >>>> like DAP to guide its development/design.
> >>>>
> >>>> Some thoughts on this
> >>>> - Defining the protocol will be a large task, but a minimal version
> >>>> should get up and round tripping quickly with a minimal subset of the
> >>>> protocol.
> >>>> - The new protocol being informed by existing debugger and DAPis key
> >>>> - Uncoupling from Daffodil is key
> >>>> - Adapt the Daffodil protocol to produce DAP after the fact so as not
> to
> >>>> constrain Daffodil debugging capability
> >>>> - We dont need to tie the protocol or adapters to a single framework,
> >>>> implementations of the IO layer should be simple enough to support
> >> multiple
> >>>> things (eg Akka, Zio, "basic" ...)
> >>>> - The current debugger lives in runtime1, but can we make an abstract
> >> API
> >>>> that any runtime would implement?
> >>>>
> >>>> Maybe a solution is structured like this
> >>>> - daffodil-debug-api:
> >>>>   - protocol model
> >>>>   - interfaces: debugger / IO adapter / etc
> >>>>   - lives in daffodil repo (new subproject?)
> >>>> - daffodil-debug-io-NAME
> >>>>   - provides implementation of a specific IO adapter
> >>>>   - multiple projects possible (daffodil-debugger-akka,
> >>>> daffodil-debugger-zio, etc)
> >>>>   - supported ones live in their own subprojects, but other can be
> >>>> plugged in from external sources
> >>>>   - ability to support multiple implementations reduces risk of
> lock-in
> >>>> - debugger applications
> >>>>   - maintained in external repositories
> >>>>   - depending on the IO implementation these could execute be in
> >> separate
> >>>> process or on separate machine
> >>>>   - like Steve said, could be any language / framework
> >>>>
> >>>> Three types of reference implementations / sample applications could
> >> also
> >>>> guide the development of the API
> >>>>   1. a replacement for the existing TUI debugger, expected to end up
> >> with
> >>>> at minimum the same functionality as the current one.
> >>>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
> >>>>   3. an IDE integration
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Also I'm working on some reference implementations of these concepts
> >>>> using Akka and Zio.  Not quite ready to talk through it yet, but the
> >> code
> >>>> is here https://github.com/jw3/example-daffodil-debug
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Yep, something like that seems very reasonable for dealing with large
> >>>>> infosets. But it still feels like we still run into usability issues.
> >>>>> For example, what if a user wants to see more? We need some
> >>>>> configuration options to increase what we've ellided. It's not big,
> but
> >>>>> every new thing that needs configuration adds complexity and
> decreases
> >>>>> usability.
> >>>>>
> >>>>> And I think the only reason we are trying to spend effort elliding
> >>>>> things is because we're limited to this gdb-like interface where you
> >> can
> >>>>> only print out a little information at a time.
> >>>>>
> >>>>> I think what would really is to dump this gdb interface and instead
> use
> >>>>> multiple windows/views. As a really close example to what I imagine,
> I
> >>>>> recently came across this hex editor:
> >>>>>
> >>>>> https://www.synalysis.net/
> >>>>>
> >>>>> The screenshots are a bit small so it's not super clear, but this
> tool
> >>>>> has one view for the data in hex, and one view for a tree of parsed
> >>>>> results (which is very similar to our infoset). The "infoset" view
> has
> >>>>> information like offset/length/value, and can be related back to the
> >>>>> data view to find the actual bits.
> >>>>>
> >>>>> I imagine the "next generation daffodil debugger" to look much like
> >>>>> this. As data is parsed, the infoset view fills up. This view could
> act
> >>>>> like a standard GUI tree so you could collapse sections or scroll
> >> around
> >>>>> to show just the parts you care about, and have search capabilities
> to
> >>>>> quickly jump around. The advantage here is you no longer really need
> >>>>> automated eliding or heuristics for what the user *might* care about.
> >>>>> You just show the whole thing and let user scroll around. As daffodil
> >>>>> parses and backtracks, this tree grows or shrinks.
> >>>>>
> >>>>> I also imagine you could have a cursor moving around the hex view, so
> >> as
> >>>>> daffodil moves around (e.g. scanning for delimiters, extracting
> >>>>> integers), one could update this data view to show what daffodil is
> >>>>> doing and where it is.
> >>>>>
> >>>>> I also image there could be other views as well. For example, a
> schema
> >>>>> view to show where in the schema daffodil is, and to add/remove
> >>>>> breakpoints. And an information view for things like variables,
> >> in-scope
> >>>>> delimiters, PoU's, etc.
> >>>>>
> >>>>> The only reason I mention a debug protcol is that would allow this
> GUI
> >>>>> to be more easily written in something other that Java/Scala to take
> >>>>> advantage of other GUI toolkits. It's been a long while since I've
> done
> >>>>> anything with Java guis, but they seems pretty poor that last I
> looked
> >>>>> at them. Would even allow for a TUI, which Java has little/no support
> >>>>> for. Also enables things like remote deubgging if an socket IPC was
> >>>>> used. Though I'm not sure all of that is necessary. Just thinking
> what
> >>>>> would be ideal, and it can always be pared back.
> >>>>>
> >>>>>
> >>>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> >>>>>> I don't think of it as a daffodil debug protocol, but just a
> >>>>> separation of concerns between display of information and the
> >> behaviors of
> >>>>> parse/unparse that need to be points where users can pause, and data
> >>>>> structures available to display.
> >>>>>>
> >>>>>> E.g., it is 100% a display issue that the infoset (shown as XML) is
> >>>>> clumsy, too big, etc.  The infoset is available in the processor
> >> state, and
> >>>>> one can examine the current node, enclosing node, prior sibling(s),
> >>>>> following sibling(s), etc. One can elide contents that are too big
> for
> >>>>> hexBinary, etc.
> >>>>>>
> >>>>>> I think this problem, how to display the infoset with sensible
> limits
> >>>>> on sizing, is fairly easy to come up with some design for, that will
> at
> >>>>> least be (1) always fairly small (2) much more useful in more cases.
> It
> >>>>> won't be perfect but can be much better than what we do now.
> >>>>>>
> >>>>>> One sensible display "mode" should be that displaying the context
> >>>>> surrounding the current element (when parsing or unparsing) displays
> at
> >>>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L
> >> characters
> >>>>> (settable within reason ?)
> >>>>>>
> >>>>>> Sibling and enclosing nodes would be displayed eliding their
> contents
> >>>>> to at most 1 line.
> >>>>>>
> >>>>>> Here's an example of what I mean. Displaying up to M=10 lines total:
> >>>>>>
> >>>>>> ...
> >>>>>> <enclosingParent1>
> >>>>>>    ...
> >>>>>>    <priorSibling2>89ab782 ...</...>
> >>>>>>    <priorSibling1>some text is here and some more text</...>
> >>>>>>    <currentNode>value might be some big thing which needs to be
> >> elided
> >>>>> ...</...>
> >>>>>>    <followingSibling1> ... </...>
> >>>>>>    ???
> >>>>>> </enclosingParent1>
> >>>>>> ???
> >>>>>>
> >>>>>> The </...> is just an idea to reduce XML matching end-tag clutter.
> >>>>>>
> >>>>>> The ... on a line alone or where element content would appear
> >>>>> generally means 1 or more other siblings. The way the display above
> >> starts
> >>>>> with ... means that this is a relative inner nest, not starting from
> >> the
> >>>>> absolute root.
> >>>>>>
> >>>>>> The ... within simple content means that content is elided to fit on
> >>>>> one line. Always follows some text characters to differentiate from
> the
> >>>>> child-element context.
> >>>>>>
> >>>>>> The ??? means zero or more other siblings.
> >>>>>>
> >>>>>> I used bold italic above to point out that the current node would be
> >>>>> highlighted somehow. Probably a way to do this that doesn't require
> >> display
> >>>>> modes would be useful. E.g., a text marker like ">>>" as in:
> >>>>>>
> >>>>>>>>> <currentNode>value .... </...>
> >>>>>>
> >>>>>> might be better, particularly for a trace output being dumped to a
> >>>>> text file.
> >>>>>>
> >>>>>> I made the above example an unparser kind of example by showing a
> >>>>> following sibling that exists that is after the current node.
> >>>>>>
> >>>>>> I think the key concept is that any sibling node is displayed in a
> >> way
> >>>>> that fits on one line.
> >>>>>> E.g., even if the element name was really long, I'd suggest:
> >>>>>>
> >>>>>>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> >>>>>>
> >>>>>> Where the element name itself gets elided because it is too long.
> >>>>>>
> >>>>>> A thought. Note that the above presentation is shown as quasi-XML,
> >> but
> >>>>> there's nothing XML-specific about it. A JSON-friendly equivalent
> >> could be
> >>>>> done as well:
> >>>>>>
> >>>>>> enclosingParent1 = {
> >>>>>>    ...
> >>>>>>    priorSibling2 = "89ab782..."
> >>>>>>    priorSibling1 = "some text is here and some more text"
> >>>>>>    currentNode = "value might be some big thing which needs to be
> >>>>> elided ..."
> >>>>>>    followingSibling1 = { ... }
> >>>>>>    ???
> >>>>>> }
> >>>>>>
> >>>>>> That's enough for 1 email thread on this debug topic.
> >>>>>>
> >>>>>>
> >>>>>> ________________________________
> >>>>>> From: Steve Lawrence <sl...@apache.org>
> >>>>>> Sent: Tuesday, January 5, 2021 2:26 PM
> >>>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> >>>>>> Subject: The future of the daffodil DFDL schema debugger?
> >>>>>>
> >>>>>>
> >>>>>> Now that we're in a new year, I'd like to start a discussion about
> >> the
> >>>>>> Daffodil DFDL Schema debugger and how it might be improved to be
> more
> >>>>>> useful.
> >>>>>>
> >>>>>> Note that this is not the capabilities to debug Daffodil itself in
> >>>>>> something like Eclipse/IntelliJ, but the ability for Daffodil to
> >>>>> provide
> >>>>>> enough extra information during a parse/unparse so that a schema
> >>>>>> developer can get an idea of what Daffodil is doing. This makes it
> >>>>>> easier for users (rather than developers) to determine why a schema
> >>>>>> isn't giving the expect parse/unparse result (either because of bad
> >>>>> data
> >>>>>> or a faulty schema.
> >>>>>>
> >>>>>> The current state of the debugger is enabled by providing the
> --debug
> >>>>> or
> >>>>>> --trace flags in the CLI. More information about that here:
> >>>>>>
> >>>>>> https://daffodil.apache.org/debugger/
> >>>>>>
> >>>>>> This enables a TUI and commands somewhat similar to GDB, providing
> >>>>> thins
> >>>>>> like breakpoints, steps, displaying the current infoset, display a
> >> dump
> >>>>>> of the data, etc.
> >>>>>>
> >>>>>> Although I find this tool pretty useful, it definitely has some
> >> glaring
> >>>>>> issues.
> >>>>>>
> >>>>>> The most glaring to me is that it really isn't useful at all for
> >>>>>> debugging unparse. The data dumps only include then main
> >> outputstream,
> >>>>>> so determine things like suspensions and buffered output is
> >> impossible.
> >>>>>>
> >>>>>> Another issue is the infoset output. When outputting the infoset,
> the
> >>>>>> debugger currently just walks the entire thing and converts it to
> XML
> >>>>>> and displays the XML. For large infosets, this is excess and can
> make
> >>>>> it
> >>>>>> impossible to use, even with some configurations the limit how much
> >> of
> >>>>>> that infoset is actually printed to the screen. Also things like
> >> large
> >>>>>> hex binary blobs create excessive and unusable output.
> >>>>>>
> >>>>>> Another thing I feel is missing is a schema view. Right now it's
> very
> >>>>>> difficult to know where in the schema Daffodil actually is.
> >>>>>>
> >>>>>> I think these issues just need some thought improvement. One could
> >>>>>> imagine a better way to stringify our unparse buffers for debug. One
> >>>>>> could image a way to receive infoset state changes so the debugger
> >> can
> >>>>>> track things like backtracks and remove infosets. One could image a
> >> way
> >>>>>> display the schema
> >>>>>>
> >>>>>> We just need a better way to stringify the current state of the
> >> unparse
> >>>>>> data including buffers, and we need a way to for the debugger to
> >>>>> receive
> >>>>>> state change information about infoset so it can update displays
> >> rather
> >>>>>> than just constantly printing the entire infoset.
> >>>>>>
> >>>>>> However, I think another other big issue is just usability in
> >> general.
> >>>>> I
> >>>>>> think the CLI usage is reasonable, but it's not always user
> friendly,
> >>>>>> and is difficult to view multiple things at the same time. I think
> >>>>>> because of this very few people even use this tool. So this this
> like
> >>>>>> perhaps something worth focus.
> >>>>>>
> >>>>>> My first thought to improving this usability issue would be to
> >>>>> implement
> >>>>>> the Debug Adapter Protocol (DAP)
> >>>>>> (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
> >>>>>> which many IDE's implement. With this implemented, Daffodil could be
> >>>>>> plugged in to any IDE that supports it and essentially get debugging
> >>>>> for
> >>>>>> free, without the need to worry about the GUI elements.
> >>>>>>
> >>>>>> I do have concerns that this just wouldn't have enough functionality
> >>>>>> that we'd really need. For example, DAP really only has ability show
> >>>>>> code (Daffodil's equivalent is the DFDL schema). There isn't a way
> to
> >>>>>> show a live view of the infoset or data. Most DAP IDE's do have a
> >>>>>> console output, so we could potentially make it so the console
> output
> >>>>> is
> >>>>>> a live view of infoset/data. But I'm not even sure most DAP friendly
> >>>>>> IDE's could support this kindof console output. Does anyone have
> >>>>>> familiarity with DAP IDE's or and what kinds of console capabilities
> >>>>> are
> >>>>>> available?
> >>>>>>
> >>>>>> I also looked into TUI libraries with the idea that we could just
> >>>>> extend
> >>>>>> our current debugger user interface to be a bit friendlier.
> >>>>>> Unfortunately, there aren't too many Java/Scala TUI libraries and
> >> those
> >>>>>> that do exist don't have Apache friendly licenses. We also want to
> be
> >>>>>> careful about increase dependencies just for a debugger than many
> >>>>> people
> >>>>>> might not use, so large graphics libraries are probably out of the
> >>>>> question.
> >>>>>>
> >>>>>> This allo makes me wonder if an approach worth taking for the future
> >> of
> >>>>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
> >>>>>> Protocol". I imagine it would be loosely based on DAP (which is
> >>>>>> essentially JSON message based) but could be targeted to the things
> >>>>> that
> >>>>>> a DFDL schema debugger would really need. An added benefit with some
> >>>>>> sort of protocol is the debugger interface can be uncoupled from
> >>>>>> Daffodil itself, so we could implement a TUI/GUI/whatever in any
> >>>>>> language/GUI framework and just have it communicate the protocol
> over
> >>>>>> some form of IPC. Another benefit is that any future backends could
> >>>>>> implement this protocol and so a single debugger could hook into
> >>>>>> different backends without much issue. Unfortunately, defining such
> a
> >>>>>> protocol might be a large task, but we do have our existing debug
> >>>>>> infrastructure and things like DAP to guide its development/design.
> >>>>>>
> >>>>>> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> >> we
> >>>>>> really just need the few improvements mentioned to the existing
> >>>>>> debugger. Is that enough to make it usable? Or is an entirely
> >> different
> >>>>>> approach needed to debugging schemas?
> >>>>>>
> >>>>>
> >>>>>
> >>
> >
>
>

Re: The future of the daffodil DFDL schema debugger?

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
Some thoughts re: data format debugger

I suggest we enumerate

  *   every single piece of state of the parser,
  *   every single piece of state of the unparser,
  *   each action/step of the parser,  (every parse combinator or primitive, their subactions)
  *   and of the unparser, (every unparse combinator, primitive, suspension,...)

and wire-frame/mock-up some display for each piece of state, and how, if changed by a step, the change to that piece of state would be displayed.

We can write down the nuances associated with these data items/actions that impact debugger display.

Some of these states/actions will be analogous to things in conventional debuggers. (e.g., looking at the values of variables) Others will be specific to DFDL needs. (e.g., looking at layers in the data stream, visualizing delimiter scanning success/failure, backtracking)

Core concepts a debugger needs are framing vs. content vs. value, and the "regions" in the data stream that make these up. The framing includes initiators, terminators, separators, alignment regions, prefix-length regions, leading/trailing skip regions, unused regions. Those surround the content region, and when padding/filling is involved (for simple types that are textual) the content region contains leading pad and trailing pad regions, surrounding the value region.

An example of graphical nested box representation of these regions is here in a design note about Daffodil:

https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/
(see section "Details of Unique and Shared Regions")

The way to start this effort is to look at the UState and PState classes. These are the state blocks. Every piece of these is potentially important to the debugger.

Lastly, an important aspect of Daffodil is the streaming behavior of the parser and unparser. While I believe it is more important to get something working than for it to cover every feature, this is an area where not anticipating how it needs to work is likely to lock one out of a future scenario that accomodates it.

So the parser doesn't produce an infoset. It  produces a stream of infoset events, or call-backs to be exact.
Due to backtracking in the parser, these events can be hung-up for substantial time while the parser continues. So we can't assume that there is any sort of correlation between parser activity and the producing of events.

The unparser doesn't consume an infoset, It consumes a stream of infoset events. Specifically, the unparser is the callback-handler for unparse infoset events.

The infoset gets trimmed so that we needn't build up the complete infoset tree in memory. As parse-events are produced, no-longer necessary parts of the infoset are pruned away. Similarly, when unparsing, once a part of the infoset has been unparsed, that part of the infoset tree is pruned away if no longer needed.


________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Thursday, April 22, 2021 9:32 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: The future of the daffodil DFDL schema debugger?

Some thoughts related to showing the infoset as if it were a variable as
this is prototyped

1) How do DAP/IDE's represent very large hierarchical data? Infosets can
be huge, and most of the time a user only cares about the most recent
infoset item. So someway to follow and show just the most recent part of
the infoset is important. The current Daffodil debugger as an
"infosetLines" setting so that it only shows the most recent X number of
lines, which is most all a user cares about when stepping through a parse.

2) Infoset items are added and removed very frequently during a parse.
Currently, when the Daffodil debugger shows the infoset it just converts
the entire thing to XML and displays that. This doesn't work at all for
large infosets since this can take a long time. I was hoping this issue
would get resolved with this new debugging infrastructure. When the
infoset is modified, we ideally want a way to specify via DAP that parts
of the variable hierarchy were added/removed rather than having to send
the entire infoset during every variable update.

3) I can imagine a feature where a user would want to select an infoset
item and jump to the associated schema element, or query information
about that infoset item (e.g.. what bit position did it start at, what
was the length). We don't have this right now, but would be really nice
to have. This suggests that we need metadata associated with each of the
variables. Does DAP have a concept of that and do IDE's have a way to
show it?

On 4/21/21 7:52 PM, Adam Rosien wrote:
> I've been reading up on DAP and wanted to share...
>
>> There are many areas though that are unique to Daffodil that have no
> representation in the spec.  These things (like InputStream, Infoset, PoU,
> different variable types, backtracking, etc) will need an extension to
> DAP.  This really boils down to defining these things to fit under the DAP
> BaseProtocol and enabling handling of those objects on both the front and
> back ends.
>
> To me, much of the current state exposed by the (Daffodil) Debugger
> translates directly to a DAP Variable[1]. DAP Variables can be
> nested/hierarchical, so they could (potentially) model larger data like the
> infoset. I can imagine shoving all the current state into Variables as a
> proof-of-concept.
>
> It also seems like the processing stack maintained by the Daffodil PState,
> where each item references the relevant schema element, could translate to
> the DAP StackFrame type [2]. That is, the path from the schema root to the
> currently processing schema element becomes the "call stack". (Apologies if
> I don't have all the Daffodil terms lined up correctly.)
>
> For displaying the input data and processing progress, I looked at a few
> existing VS Code extensions that provided non-builtin views, some of which
> interact with their DAP debugger code [3] [4] [5] [6].
>
> Finally, I took a cursory look at scala-debug-adapter [7], which, for
> reference, wraps Microsoft's java-debug implementation of DAP. I was
> curious about the set of request/response and event types. Additionally,
> the Typescript API to VS Code offers custom DAP requests and responses, but
> I couldn't find the equivalent notion in the java-debug project.
>
> .. Adam
>
> [1]
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
> [2]
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_StackFrame
> [3] https://github.com/scalameta/metals-vscode (provides a debugger and
> non-debugger custom UI)
> [4] https://github.com/microsoft/vscode-cpptools (debugger + memory view)
> [5] https://marketplace.visualstudio.com/items?itemName=marus25.cortex-debug
> (debugger + memory view,
> https://github.com/Marus/cortex-debug/blob/master/src/frontend/memory_content_provider.ts
> )
> [6]
> https://marketplace.visualstudio.com/items?itemName=slevesque.vscode-hexdump
> (extension for hexdumps that could be controlled by other extensions)
> [7] https://github.com/scalacenter/scala-debug-adapter
> [8] https://github.com/microsoft/java-debug
>
> On Tue, Apr 20, 2021 at 7:08 AM John Wass <jw...@gmail.com> wrote:
>
>>> Going to look deeper into how DAP might fit with Daffodil
>>
>> Have been looking over DAP and getting a good feeling about it. The
>> specification [1] seems general enough that it could be applied to Daffodil
>> and cover a swath of common operations (like start, stop, break, continue,
>> code locations, variables, etc).
>>
>> There are many areas though that are unique to Daffodil that have no
>> representation in the spec.  These things (like InputStream, Infoset, PoU,
>> different variable types, backtracking, etc) will need an extension to
>> DAP.  This really boils down to defining these things to fit under the DAP
>> BaseProtocol and enabling handling of those objects on both the front and
>> back ends.
>>
>> On the backend we need a Daffodil DAP protocol server.  Existing JVM
>> implementations (like Java [2], Scala [3]) are tied closely to JDI and
>> would bring a lot of extra baggage to work around that.  Developing a
>> Daffodil specific implementation is no small task, but feasible.  There are
>> a several existing implementations on the JVM that are close and can be
>> looked at for reference.
>>
>> The backend implementation would look similar to what was described in an
>> earlier post.  We could use ZIO/Akka/etc to implement the backend Protocol
>> Server to enable the IO between the Daffodil process and the DAP clients.
>> This implementation would now be guided by the DAP specification.
>>
>> With the protocol and backend extended to fit Daffodil that leaves the
>> frontend.  In theory an existing IDE plugin should get pretty close to
>> being able to perform the common debug operations mentioned above.  To
>> support the Daffodil extensions there will need to be handling of the
>> extended protocol into whatever views are desired/applicable.
>>
>>> Also looking into the Java Debug Interface (JDI) for comparison.
>>
>> JDI appears to be the wrong level of abstraction for what we are talking
>> about in debugging Daffodil for schema development.  While DAP does do JVM
>> debugging (through a JDI DAP impl) it also generalizes to many other
>> debugging scenarios.  JDI on the other hand is very tied to the JVM.
>>
>> Extending the JDI appears to be more complex than dealing with DAP, and
>> even though the JDI API is mostly defined with interfaces, there are choke
>> points that limit to JVM concepts.  For example jdi.Value has a finite set
>> of JVM types that it works with, its not clear where Daffodil types would
>> plugin if even possible.
>>
>> The final note is that unique Daffodil features wouldn’t get to IDE support
>> any faster JDI.  In some cases, like VS Code, you would still need an
>> extended DAP to support these features.
>>
>>> and depending on how it shakes out will update the example to show
>> integration
>>
>> It would appear wise to investigate DAP further.  Next step is to refine
>> these thoughts with a prototype. I started an implementation in the example
>> debugger project [4] to try to run the current example on a _minimal_ DAP
>> implementation.
>>
>>
>> [1] https://microsoft.github.io/debug-adapter-protocol/specification
>> [2] https://github.com/Microsoft/java-debug
>> [3] https://github.com/scalacenter/scala-debug-adapter
>> [4] https://github.com/jw3/example-daffodil-debug
>>
>>
>> On Mon, Apr 12, 2021 at 9:58 AM John Wass <jw...@gmail.com> wrote:
>>
>>>> the code is here https://github.com/jw3/example-daffodil-debug
>>>
>>> There is now a complete console based example for Zio that demonstrates
>>> controlling the debug flow while distributing the current state to three
>>> "displays".
>>> 1. infoset at current step
>>> 2. diff of infoset against previous step
>>> 3. bit position and value of data.
>>>
>>> These displays are very rudimentary but demonstrate the ability to
>>> asynchronously populate multiple views while synchronously controlling
>> the
>>> debug loop.
>>>
>>>> - The new protocol being informed by existing debugger and DAPis key
>>>
>>> Going to look deeper into how DAP might fit with Daffodil, and depending
>>> on how it shakes out will update the example to show integration.
>>>
>>> Some interesting links to start with
>>> - https://github.com/scalacenter/scala-debug-adapter
>>> -
>>>
>> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
>>> - https://github.com/microsoft/java-debug
>>>
>>> Also looking into the Java Debug Interface (JDI) for comparison.
>>>
>>>
>>> On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:
>>>
>>>> Revisiting this post after doing some debugger related work and thinking
>>>> about debug protocol/adapters to connect external tooling to the debug
>>>> process.
>>>>
>>>> This comment is good
>>>>
>>>>> This allo makes me wonder if an approach worth taking for the future
>> of
>>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
>>>> Protocol". I imagine it would be loosely based on DAP (which is
>>>> essentially JSON message based) but could be targeted to the things
>> that a
>>>> DFDL schema debugger would really need. An added benefit with some
>> sort of
>>>> protocol is the debugger interface can be uncoupled from Daffodil
>>>> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
>>>> framework and just have it communicate the protocol over some form of
>>>> IPC. Another benefit is that any future backends could implement this
>>>> protocol and so a single debugger could hook into different backends
>>>> without much issue. Unfortunately, defining such a protocol might be a
>>>> large task, but we do have our existing debug infrastructure and things
>>>> like DAP to guide its development/design.
>>>>
>>>> Some thoughts on this
>>>> - Defining the protocol will be a large task, but a minimal version
>>>> should get up and round tripping quickly with a minimal subset of the
>>>> protocol.
>>>> - The new protocol being informed by existing debugger and DAPis key
>>>> - Uncoupling from Daffodil is key
>>>> - Adapt the Daffodil protocol to produce DAP after the fact so as not to
>>>> constrain Daffodil debugging capability
>>>> - We dont need to tie the protocol or adapters to a single framework,
>>>> implementations of the IO layer should be simple enough to support
>> multiple
>>>> things (eg Akka, Zio, "basic" ...)
>>>> - The current debugger lives in runtime1, but can we make an abstract
>> API
>>>> that any runtime would implement?
>>>>
>>>> Maybe a solution is structured like this
>>>> - daffodil-debug-api:
>>>>   - protocol model
>>>>   - interfaces: debugger / IO adapter / etc
>>>>   - lives in daffodil repo (new subproject?)
>>>> - daffodil-debug-io-NAME
>>>>   - provides implementation of a specific IO adapter
>>>>   - multiple projects possible (daffodil-debugger-akka,
>>>> daffodil-debugger-zio, etc)
>>>>   - supported ones live in their own subprojects, but other can be
>>>> plugged in from external sources
>>>>   - ability to support multiple implementations reduces risk of lock-in
>>>> - debugger applications
>>>>   - maintained in external repositories
>>>>   - depending on the IO implementation these could execute be in
>> separate
>>>> process or on separate machine
>>>>   - like Steve said, could be any language / framework
>>>>
>>>> Three types of reference implementations / sample applications could
>> also
>>>> guide the development of the API
>>>>   1. a replacement for the existing TUI debugger, expected to end up
>> with
>>>> at minimum the same functionality as the current one.
>>>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
>>>>   3. an IDE integration
>>>>
>>>> Thoughts?
>>>>
>>>> Also I'm working on some reference implementations of these concepts
>>>> using Akka and Zio.  Not quite ready to talk through it yet, but the
>> code
>>>> is here https://github.com/jw3/example-daffodil-debug
>>>>
>>>>
>>>>
>>>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
>>>> wrote:
>>>>
>>>>> Yep, something like that seems very reasonable for dealing with large
>>>>> infosets. But it still feels like we still run into usability issues.
>>>>> For example, what if a user wants to see more? We need some
>>>>> configuration options to increase what we've ellided. It's not big, but
>>>>> every new thing that needs configuration adds complexity and decreases
>>>>> usability.
>>>>>
>>>>> And I think the only reason we are trying to spend effort elliding
>>>>> things is because we're limited to this gdb-like interface where you
>> can
>>>>> only print out a little information at a time.
>>>>>
>>>>> I think what would really is to dump this gdb interface and instead use
>>>>> multiple windows/views. As a really close example to what I imagine, I
>>>>> recently came across this hex editor:
>>>>>
>>>>> https://www.synalysis.net/
>>>>>
>>>>> The screenshots are a bit small so it's not super clear, but this tool
>>>>> has one view for the data in hex, and one view for a tree of parsed
>>>>> results (which is very similar to our infoset). The "infoset" view has
>>>>> information like offset/length/value, and can be related back to the
>>>>> data view to find the actual bits.
>>>>>
>>>>> I imagine the "next generation daffodil debugger" to look much like
>>>>> this. As data is parsed, the infoset view fills up. This view could act
>>>>> like a standard GUI tree so you could collapse sections or scroll
>> around
>>>>> to show just the parts you care about, and have search capabilities to
>>>>> quickly jump around. The advantage here is you no longer really need
>>>>> automated eliding or heuristics for what the user *might* care about.
>>>>> You just show the whole thing and let user scroll around. As daffodil
>>>>> parses and backtracks, this tree grows or shrinks.
>>>>>
>>>>> I also imagine you could have a cursor moving around the hex view, so
>> as
>>>>> daffodil moves around (e.g. scanning for delimiters, extracting
>>>>> integers), one could update this data view to show what daffodil is
>>>>> doing and where it is.
>>>>>
>>>>> I also image there could be other views as well. For example, a schema
>>>>> view to show where in the schema daffodil is, and to add/remove
>>>>> breakpoints. And an information view for things like variables,
>> in-scope
>>>>> delimiters, PoU's, etc.
>>>>>
>>>>> The only reason I mention a debug protcol is that would allow this GUI
>>>>> to be more easily written in something other that Java/Scala to take
>>>>> advantage of other GUI toolkits. It's been a long while since I've done
>>>>> anything with Java guis, but they seems pretty poor that last I looked
>>>>> at them. Would even allow for a TUI, which Java has little/no support
>>>>> for. Also enables things like remote deubgging if an socket IPC was
>>>>> used. Though I'm not sure all of that is necessary. Just thinking what
>>>>> would be ideal, and it can always be pared back.
>>>>>
>>>>>
>>>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
>>>>>> I don't think of it as a daffodil debug protocol, but just a
>>>>> separation of concerns between display of information and the
>> behaviors of
>>>>> parse/unparse that need to be points where users can pause, and data
>>>>> structures available to display.
>>>>>>
>>>>>> E.g., it is 100% a display issue that the infoset (shown as XML) is
>>>>> clumsy, too big, etc.  The infoset is available in the processor
>> state, and
>>>>> one can examine the current node, enclosing node, prior sibling(s),
>>>>> following sibling(s), etc. One can elide contents that are too big for
>>>>> hexBinary, etc.
>>>>>>
>>>>>> I think this problem, how to display the infoset with sensible limits
>>>>> on sizing, is fairly easy to come up with some design for, that will at
>>>>> least be (1) always fairly small (2) much more useful in more cases. It
>>>>> won't be perfect but can be much better than what we do now.
>>>>>>
>>>>>> One sensible display "mode" should be that displaying the context
>>>>> surrounding the current element (when parsing or unparsing) displays at
>>>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L
>> characters
>>>>> (settable within reason ?)
>>>>>>
>>>>>> Sibling and enclosing nodes would be displayed eliding their contents
>>>>> to at most 1 line.
>>>>>>
>>>>>> Here's an example of what I mean. Displaying up to M=10 lines total:
>>>>>>
>>>>>> ...
>>>>>> <enclosingParent1>
>>>>>>    ...
>>>>>>    <priorSibling2>89ab782 ...</...>
>>>>>>    <priorSibling1>some text is here and some more text</...>
>>>>>>    <currentNode>value might be some big thing which needs to be
>> elided
>>>>> ...</...>
>>>>>>    <followingSibling1> ... </...>
>>>>>>    ???
>>>>>> </enclosingParent1>
>>>>>> ???
>>>>>>
>>>>>> The </...> is just an idea to reduce XML matching end-tag clutter.
>>>>>>
>>>>>> The ... on a line alone or where element content would appear
>>>>> generally means 1 or more other siblings. The way the display above
>> starts
>>>>> with ... means that this is a relative inner nest, not starting from
>> the
>>>>> absolute root.
>>>>>>
>>>>>> The ... within simple content means that content is elided to fit on
>>>>> one line. Always follows some text characters to differentiate from the
>>>>> child-element context.
>>>>>>
>>>>>> The ??? means zero or more other siblings.
>>>>>>
>>>>>> I used bold italic above to point out that the current node would be
>>>>> highlighted somehow. Probably a way to do this that doesn't require
>> display
>>>>> modes would be useful. E.g., a text marker like ">>>" as in:
>>>>>>
>>>>>>>>> <currentNode>value .... </...>
>>>>>>
>>>>>> might be better, particularly for a trace output being dumped to a
>>>>> text file.
>>>>>>
>>>>>> I made the above example an unparser kind of example by showing a
>>>>> following sibling that exists that is after the current node.
>>>>>>
>>>>>> I think the key concept is that any sibling node is displayed in a
>> way
>>>>> that fits on one line.
>>>>>> E.g., even if the element name was really long, I'd suggest:
>>>>>>
>>>>>>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>>>>>>
>>>>>> Where the element name itself gets elided because it is too long.
>>>>>>
>>>>>> A thought. Note that the above presentation is shown as quasi-XML,
>> but
>>>>> there's nothing XML-specific about it. A JSON-friendly equivalent
>> could be
>>>>> done as well:
>>>>>>
>>>>>> enclosingParent1 = {
>>>>>>    ...
>>>>>>    priorSibling2 = "89ab782..."
>>>>>>    priorSibling1 = "some text is here and some more text"
>>>>>>    currentNode = "value might be some big thing which needs to be
>>>>> elided ..."
>>>>>>    followingSibling1 = { ... }
>>>>>>    ???
>>>>>> }
>>>>>>
>>>>>> That's enough for 1 email thread on this debug topic.
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>> From: Steve Lawrence <sl...@apache.org>
>>>>>> Sent: Tuesday, January 5, 2021 2:26 PM
>>>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>>>> Subject: The future of the daffodil DFDL schema debugger?
>>>>>>
>>>>>>
>>>>>> Now that we're in a new year, I'd like to start a discussion about
>> the
>>>>>> Daffodil DFDL Schema debugger and how it might be improved to be more
>>>>>> useful.
>>>>>>
>>>>>> Note that this is not the capabilities to debug Daffodil itself in
>>>>>> something like Eclipse/IntelliJ, but the ability for Daffodil to
>>>>> provide
>>>>>> enough extra information during a parse/unparse so that a schema
>>>>>> developer can get an idea of what Daffodil is doing. This makes it
>>>>>> easier for users (rather than developers) to determine why a schema
>>>>>> isn't giving the expect parse/unparse result (either because of bad
>>>>> data
>>>>>> or a faulty schema.
>>>>>>
>>>>>> The current state of the debugger is enabled by providing the --debug
>>>>> or
>>>>>> --trace flags in the CLI. More information about that here:
>>>>>>
>>>>>> https://daffodil.apache.org/debugger/
>>>>>>
>>>>>> This enables a TUI and commands somewhat similar to GDB, providing
>>>>> thins
>>>>>> like breakpoints, steps, displaying the current infoset, display a
>> dump
>>>>>> of the data, etc.
>>>>>>
>>>>>> Although I find this tool pretty useful, it definitely has some
>> glaring
>>>>>> issues.
>>>>>>
>>>>>> The most glaring to me is that it really isn't useful at all for
>>>>>> debugging unparse. The data dumps only include then main
>> outputstream,
>>>>>> so determine things like suspensions and buffered output is
>> impossible.
>>>>>>
>>>>>> Another issue is the infoset output. When outputting the infoset, the
>>>>>> debugger currently just walks the entire thing and converts it to XML
>>>>>> and displays the XML. For large infosets, this is excess and can make
>>>>> it
>>>>>> impossible to use, even with some configurations the limit how much
>> of
>>>>>> that infoset is actually printed to the screen. Also things like
>> large
>>>>>> hex binary blobs create excessive and unusable output.
>>>>>>
>>>>>> Another thing I feel is missing is a schema view. Right now it's very
>>>>>> difficult to know where in the schema Daffodil actually is.
>>>>>>
>>>>>> I think these issues just need some thought improvement. One could
>>>>>> imagine a better way to stringify our unparse buffers for debug. One
>>>>>> could image a way to receive infoset state changes so the debugger
>> can
>>>>>> track things like backtracks and remove infosets. One could image a
>> way
>>>>>> display the schema
>>>>>>
>>>>>> We just need a better way to stringify the current state of the
>> unparse
>>>>>> data including buffers, and we need a way to for the debugger to
>>>>> receive
>>>>>> state change information about infoset so it can update displays
>> rather
>>>>>> than just constantly printing the entire infoset.
>>>>>>
>>>>>> However, I think another other big issue is just usability in
>> general.
>>>>> I
>>>>>> think the CLI usage is reasonable, but it's not always user friendly,
>>>>>> and is difficult to view multiple things at the same time. I think
>>>>>> because of this very few people even use this tool. So this this like
>>>>>> perhaps something worth focus.
>>>>>>
>>>>>> My first thought to improving this usability issue would be to
>>>>> implement
>>>>>> the Debug Adapter Protocol (DAP)
>>>>>> (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
>>>>>> which many IDE's implement. With this implemented, Daffodil could be
>>>>>> plugged in to any IDE that supports it and essentially get debugging
>>>>> for
>>>>>> free, without the need to worry about the GUI elements.
>>>>>>
>>>>>> I do have concerns that this just wouldn't have enough functionality
>>>>>> that we'd really need. For example, DAP really only has ability show
>>>>>> code (Daffodil's equivalent is the DFDL schema). There isn't a way to
>>>>>> show a live view of the infoset or data. Most DAP IDE's do have a
>>>>>> console output, so we could potentially make it so the console output
>>>>> is
>>>>>> a live view of infoset/data. But I'm not even sure most DAP friendly
>>>>>> IDE's could support this kindof console output. Does anyone have
>>>>>> familiarity with DAP IDE's or and what kinds of console capabilities
>>>>> are
>>>>>> available?
>>>>>>
>>>>>> I also looked into TUI libraries with the idea that we could just
>>>>> extend
>>>>>> our current debugger user interface to be a bit friendlier.
>>>>>> Unfortunately, there aren't too many Java/Scala TUI libraries and
>> those
>>>>>> that do exist don't have Apache friendly licenses. We also want to be
>>>>>> careful about increase dependencies just for a debugger than many
>>>>> people
>>>>>> might not use, so large graphics libraries are probably out of the
>>>>> question.
>>>>>>
>>>>>> This allo makes me wonder if an approach worth taking for the future
>> of
>>>>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
>>>>>> Protocol". I imagine it would be loosely based on DAP (which is
>>>>>> essentially JSON message based) but could be targeted to the things
>>>>> that
>>>>>> a DFDL schema debugger would really need. An added benefit with some
>>>>>> sort of protocol is the debugger interface can be uncoupled from
>>>>>> Daffodil itself, so we could implement a TUI/GUI/whatever in any
>>>>>> language/GUI framework and just have it communicate the protocol over
>>>>>> some form of IPC. Another benefit is that any future backends could
>>>>>> implement this protocol and so a single debugger could hook into
>>>>>> different backends without much issue. Unfortunately, defining such a
>>>>>> protocol might be a large task, but we do have our existing debug
>>>>>> infrastructure and things like DAP to guide its development/design.
>>>>>>
>>>>>> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
>> we
>>>>>> really just need the few improvements mentioned to the existing
>>>>>> debugger. Is that enough to make it usable? Or is an entirely
>> different
>>>>>> approach needed to debugging schemas?
>>>>>>
>>>>>
>>>>>
>>
>


Re: The future of the daffodil DFDL schema debugger?

Posted by Adam Rosien <ad...@rosien.net>.
Sorry for the threading madness, my default of markdown quoting doesn't
interact well with mailing lists like this...

On Thu, Apr 22, 2021 at 11:14 AM Adam Rosien <ad...@rosien.net> wrote:

>
>
> On Thu, Apr 22, 2021 at 7:03 AM Steve Lawrence <sl...@apache.org>
> wrote:
>
>> Some thoughts related to showing the infoset as if it were a variable as
>> this is prototyped
>>
>> 1) How do DAP/IDE's represent very large hierarchical data? Infosets can
>> be huge, and most of the time a user only cares about the most recent
>> infoset item. So someway to follow and show just the most recent part of
>> the infoset is important. The current Daffodil debugger as an
>> "infosetLines" setting so that it only shows the most recent X number of
>> lines, which is most all a user cares about when stepping through a parse.
>>
>
> DAP Variables, if nested, can be lazily loaded with children offsets, etc.
>
> > If the number of named or indexed children is large, the numbers should
> be returned via the optional ‘namedVariables’ and ‘indexedVariables’
> attributes.
>
>  -
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
>
> Or as the current behavior, only a window into the infoset could be
> reported.
>
>
>>
>> 2) Infoset items are added and removed very frequently during a parse.
>> Currently, when the Daffodil debugger shows the infoset it just converts
>> the entire thing to XML and displays that. This doesn't work at all for
>> large infosets since this can take a long time. I was hoping this issue
>> would get resolved with this new debugging infrastructure. When the
>> infoset is modified, we ideally want a way to specify via DAP that parts
>> of the variable hierarchy were added/removed rather than having to send
>> the entire infoset during every variable update.
>>
>
> As I understand it, DAP only requests the current state when the debugger
> is stopped (due to a breakpoint, stepping, etc.):
>
> > Whenever the program stops (on program entry, because a breakpoint was
> hit, an exception occurred, or the user requested execution to be paused),
> the debug adapter sends a stopped event with the appropriate reason and
> thread id.
> >
> > Upon receipt, the development tool first requests the threads (see
> below) and then the stacktrace (a list of stack frames) for the thread
> mentioned in the stopped event. If the user then drills into the stack
> frame, the development tool first requests the scopes for a stack frame,
> and then the variables for a scope. If a variable is itself structured, the
> development tool requests its properties through additional variables
> requests.
>
>   - https://microsoft.github.io/debug-adapter-protocol/overview
> ("Stopping and accessing debuggee state")
>
> Large data like infosets could be lazily transferred, or some window into
> it sent.
>
>
>>
>> 3) I can imagine a feature where a user would want to select an infoset
>> item and jump to the associated schema element, or query information
>> about that infoset item (e.g.. what bit position did it start at, what
>> was the length). We don't have this right now, but would be really nice
>> to have. This suggests that we need metadata associated with each of the
>> variables. Does DAP have a concept of that and do IDE's have a way to
>> show it?
>>
>
> From what I can tell, DAP doesn't cover any view-related interaction with
> the debugger state. You can perform actions like "setVariable" if a user
> wants to override a reported value (not sure we'd want this, but just
> pointing it out), but there isn't a "jump to this resource at this
> location" view command defined within DAP.
>
> However, the VS Code extensions I previously mentioned *do* implement
> similar functionality to "jump to this resource at this location". I
> believe VS Code will react to debugger events, for example, when a
> breakpoint is reached and the debuggee provides the current stacktrace, if
> the user selects a particular stack frame, that frame has a reference to
> the associated "source", which the UI can display. In the case of
> stacktrace-as-schema-processing, each frame would correspond to the
> location of the schema element, and the UI could focus on that location.
>
>
>>
>> On 4/21/21 7:52 PM, Adam Rosien wrote:
>> > I've been reading up on DAP and wanted to share...
>> >
>> >> There are many areas though that are unique to Daffodil that have no
>> > representation in the spec.  These things (like InputStream, Infoset,
>> PoU,
>> > different variable types, backtracking, etc) will need an extension to
>> > DAP.  This really boils down to defining these things to fit under the
>> DAP
>> > BaseProtocol and enabling handling of those objects on both the front
>> and
>> > back ends.
>> >
>> > To me, much of the current state exposed by the (Daffodil) Debugger
>> > translates directly to a DAP Variable[1]. DAP Variables can be
>> > nested/hierarchical, so they could (potentially) model larger data like
>> the
>> > infoset. I can imagine shoving all the current state into Variables as a
>> > proof-of-concept.
>> >
>> > It also seems like the processing stack maintained by the Daffodil
>> PState,
>> > where each item references the relevant schema element, could translate
>> to
>> > the DAP StackFrame type [2]. That is, the path from the schema root to
>> the
>> > currently processing schema element becomes the "call stack".
>> (Apologies if
>> > I don't have all the Daffodil terms lined up correctly.)
>> >
>> > For displaying the input data and processing progress, I looked at a few
>> > existing VS Code extensions that provided non-builtin views, some of
>> which
>> > interact with their DAP debugger code [3] [4] [5] [6].
>> >
>> > Finally, I took a cursory look at scala-debug-adapter [7], which, for
>> > reference, wraps Microsoft's java-debug implementation of DAP. I was
>> > curious about the set of request/response and event types. Additionally,
>> > the Typescript API to VS Code offers custom DAP requests and responses,
>> but
>> > I couldn't find the equivalent notion in the java-debug project.
>> >
>> > .. Adam
>> >
>> > [1]
>> >
>> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
>> > [2]
>> >
>> https://microsoft.github.io/debug-adapter-protocol/specification#Types_StackFrame
>> > [3] https://github.com/scalameta/metals-vscode (provides a debugger and
>> > non-debugger custom UI)
>> > [4] https://github.com/microsoft/vscode-cpptools (debugger + memory
>> view)
>> > [5]
>> https://marketplace.visualstudio.com/items?itemName=marus25.cortex-debug
>> > (debugger + memory view,
>> >
>> https://github.com/Marus/cortex-debug/blob/master/src/frontend/memory_content_provider.ts
>> > )
>> > [6]
>> >
>> https://marketplace.visualstudio.com/items?itemName=slevesque.vscode-hexdump
>> > (extension for hexdumps that could be controlled by other extensions)
>> > [7] https://github.com/scalacenter/scala-debug-adapter
>> > [8] https://github.com/microsoft/java-debug
>> >
>> > On Tue, Apr 20, 2021 at 7:08 AM John Wass <jw...@gmail.com> wrote:
>> >
>> >>> Going to look deeper into how DAP might fit with Daffodil
>> >>
>> >> Have been looking over DAP and getting a good feeling about it. The
>> >> specification [1] seems general enough that it could be applied to
>> Daffodil
>> >> and cover a swath of common operations (like start, stop, break,
>> continue,
>> >> code locations, variables, etc).
>> >>
>> >> There are many areas though that are unique to Daffodil that have no
>> >> representation in the spec.  These things (like InputStream, Infoset,
>> PoU,
>> >> different variable types, backtracking, etc) will need an extension to
>> >> DAP.  This really boils down to defining these things to fit under the
>> DAP
>> >> BaseProtocol and enabling handling of those objects on both the front
>> and
>> >> back ends.
>> >>
>> >> On the backend we need a Daffodil DAP protocol server.  Existing JVM
>> >> implementations (like Java [2], Scala [3]) are tied closely to JDI and
>> >> would bring a lot of extra baggage to work around that.  Developing a
>> >> Daffodil specific implementation is no small task, but feasible.
>> There are
>> >> a several existing implementations on the JVM that are close and can be
>> >> looked at for reference.
>> >>
>> >> The backend implementation would look similar to what was described in
>> an
>> >> earlier post.  We could use ZIO/Akka/etc to implement the backend
>> Protocol
>> >> Server to enable the IO between the Daffodil process and the DAP
>> clients.
>> >> This implementation would now be guided by the DAP specification.
>> >>
>> >> With the protocol and backend extended to fit Daffodil that leaves the
>> >> frontend.  In theory an existing IDE plugin should get pretty close to
>> >> being able to perform the common debug operations mentioned above.  To
>> >> support the Daffodil extensions there will need to be handling of the
>> >> extended protocol into whatever views are desired/applicable.
>> >>
>> >>> Also looking into the Java Debug Interface (JDI) for comparison.
>> >>
>> >> JDI appears to be the wrong level of abstraction for what we are
>> talking
>> >> about in debugging Daffodil for schema development.  While DAP does do
>> JVM
>> >> debugging (through a JDI DAP impl) it also generalizes to many other
>> >> debugging scenarios.  JDI on the other hand is very tied to the JVM.
>> >>
>> >> Extending the JDI appears to be more complex than dealing with DAP, and
>> >> even though the JDI API is mostly defined with interfaces, there are
>> choke
>> >> points that limit to JVM concepts.  For example jdi.Value has a finite
>> set
>> >> of JVM types that it works with, its not clear where Daffodil types
>> would
>> >> plugin if even possible.
>> >>
>> >> The final note is that unique Daffodil features wouldn’t get to IDE
>> support
>> >> any faster JDI.  In some cases, like VS Code, you would still need an
>> >> extended DAP to support these features.
>> >>
>> >>> and depending on how it shakes out will update the example to show
>> >> integration
>> >>
>> >> It would appear wise to investigate DAP further.  Next step is to
>> refine
>> >> these thoughts with a prototype. I started an implementation in the
>> example
>> >> debugger project [4] to try to run the current example on a _minimal_
>> DAP
>> >> implementation.
>> >>
>> >>
>> >> [1] https://microsoft.github.io/debug-adapter-protocol/specification
>> >> [2] https://github.com/Microsoft/java-debug
>> >> [3] https://github.com/scalacenter/scala-debug-adapter
>> >> [4] https://github.com/jw3/example-daffodil-debug
>> >>
>> >>
>> >> On Mon, Apr 12, 2021 at 9:58 AM John Wass <jw...@gmail.com> wrote:
>> >>
>> >>>> the code is here https://github.com/jw3/example-daffodil-debug
>> >>>
>> >>> There is now a complete console based example for Zio that
>> demonstrates
>> >>> controlling the debug flow while distributing the current state to
>> three
>> >>> "displays".
>> >>> 1. infoset at current step
>> >>> 2. diff of infoset against previous step
>> >>> 3. bit position and value of data.
>> >>>
>> >>> These displays are very rudimentary but demonstrate the ability to
>> >>> asynchronously populate multiple views while synchronously controlling
>> >> the
>> >>> debug loop.
>> >>>
>> >>>> - The new protocol being informed by existing debugger and DAPis key
>> >>>
>> >>> Going to look deeper into how DAP might fit with Daffodil, and
>> depending
>> >>> on how it shakes out will update the example to show integration.
>> >>>
>> >>> Some interesting links to start with
>> >>> - https://github.com/scalacenter/scala-debug-adapter
>> >>> -
>> >>>
>> >>
>> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
>> >>> - https://github.com/microsoft/java-debug
>> >>>
>> >>> Also looking into the Java Debug Interface (JDI) for comparison.
>> >>>
>> >>>
>> >>> On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:
>> >>>
>> >>>> Revisiting this post after doing some debugger related work and
>> thinking
>> >>>> about debug protocol/adapters to connect external tooling to the
>> debug
>> >>>> process.
>> >>>>
>> >>>> This comment is good
>> >>>>
>> >>>>> This allo makes me wonder if an approach worth taking for the future
>> >> of
>> >>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
>> >>>> Protocol". I imagine it would be loosely based on DAP (which is
>> >>>> essentially JSON message based) but could be targeted to the things
>> >> that a
>> >>>> DFDL schema debugger would really need. An added benefit with some
>> >> sort of
>> >>>> protocol is the debugger interface can be uncoupled from Daffodil
>> >>>> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
>> >>>> framework and just have it communicate the protocol over some form of
>> >>>> IPC. Another benefit is that any future backends could implement this
>> >>>> protocol and so a single debugger could hook into different backends
>> >>>> without much issue. Unfortunately, defining such a protocol might be
>> a
>> >>>> large task, but we do have our existing debug infrastructure and
>> things
>> >>>> like DAP to guide its development/design.
>> >>>>
>> >>>> Some thoughts on this
>> >>>> - Defining the protocol will be a large task, but a minimal version
>> >>>> should get up and round tripping quickly with a minimal subset of the
>> >>>> protocol.
>> >>>> - The new protocol being informed by existing debugger and DAPis key
>> >>>> - Uncoupling from Daffodil is key
>> >>>> - Adapt the Daffodil protocol to produce DAP after the fact so as
>> not to
>> >>>> constrain Daffodil debugging capability
>> >>>> - We dont need to tie the protocol or adapters to a single framework,
>> >>>> implementations of the IO layer should be simple enough to support
>> >> multiple
>> >>>> things (eg Akka, Zio, "basic" ...)
>> >>>> - The current debugger lives in runtime1, but can we make an abstract
>> >> API
>> >>>> that any runtime would implement?
>> >>>>
>> >>>> Maybe a solution is structured like this
>> >>>> - daffodil-debug-api:
>> >>>>   - protocol model
>> >>>>   - interfaces: debugger / IO adapter / etc
>> >>>>   - lives in daffodil repo (new subproject?)
>> >>>> - daffodil-debug-io-NAME
>> >>>>   - provides implementation of a specific IO adapter
>> >>>>   - multiple projects possible (daffodil-debugger-akka,
>> >>>> daffodil-debugger-zio, etc)
>> >>>>   - supported ones live in their own subprojects, but other can be
>> >>>> plugged in from external sources
>> >>>>   - ability to support multiple implementations reduces risk of
>> lock-in
>> >>>> - debugger applications
>> >>>>   - maintained in external repositories
>> >>>>   - depending on the IO implementation these could execute be in
>> >> separate
>> >>>> process or on separate machine
>> >>>>   - like Steve said, could be any language / framework
>> >>>>
>> >>>> Three types of reference implementations / sample applications could
>> >> also
>> >>>> guide the development of the API
>> >>>>   1. a replacement for the existing TUI debugger, expected to end up
>> >> with
>> >>>> at minimum the same functionality as the current one.
>> >>>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
>> >>>>   3. an IDE integration
>> >>>>
>> >>>> Thoughts?
>> >>>>
>> >>>> Also I'm working on some reference implementations of these concepts
>> >>>> using Akka and Zio.  Not quite ready to talk through it yet, but the
>> >> code
>> >>>> is here https://github.com/jw3/example-daffodil-debug
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
>> >>>> wrote:
>> >>>>
>> >>>>> Yep, something like that seems very reasonable for dealing with
>> large
>> >>>>> infosets. But it still feels like we still run into usability
>> issues.
>> >>>>> For example, what if a user wants to see more? We need some
>> >>>>> configuration options to increase what we've ellided. It's not big,
>> but
>> >>>>> every new thing that needs configuration adds complexity and
>> decreases
>> >>>>> usability.
>> >>>>>
>> >>>>> And I think the only reason we are trying to spend effort elliding
>> >>>>> things is because we're limited to this gdb-like interface where you
>> >> can
>> >>>>> only print out a little information at a time.
>> >>>>>
>> >>>>> I think what would really is to dump this gdb interface and instead
>> use
>> >>>>> multiple windows/views. As a really close example to what I
>> imagine, I
>> >>>>> recently came across this hex editor:
>> >>>>>
>> >>>>> https://www.synalysis.net/
>> >>>>>
>> >>>>> The screenshots are a bit small so it's not super clear, but this
>> tool
>> >>>>> has one view for the data in hex, and one view for a tree of parsed
>> >>>>> results (which is very similar to our infoset). The "infoset" view
>> has
>> >>>>> information like offset/length/value, and can be related back to the
>> >>>>> data view to find the actual bits.
>> >>>>>
>> >>>>> I imagine the "next generation daffodil debugger" to look much like
>> >>>>> this. As data is parsed, the infoset view fills up. This view could
>> act
>> >>>>> like a standard GUI tree so you could collapse sections or scroll
>> >> around
>> >>>>> to show just the parts you care about, and have search capabilities
>> to
>> >>>>> quickly jump around. The advantage here is you no longer really need
>> >>>>> automated eliding or heuristics for what the user *might* care
>> about.
>> >>>>> You just show the whole thing and let user scroll around. As
>> daffodil
>> >>>>> parses and backtracks, this tree grows or shrinks.
>> >>>>>
>> >>>>> I also imagine you could have a cursor moving around the hex view,
>> so
>> >> as
>> >>>>> daffodil moves around (e.g. scanning for delimiters, extracting
>> >>>>> integers), one could update this data view to show what daffodil is
>> >>>>> doing and where it is.
>> >>>>>
>> >>>>> I also image there could be other views as well. For example, a
>> schema
>> >>>>> view to show where in the schema daffodil is, and to add/remove
>> >>>>> breakpoints. And an information view for things like variables,
>> >> in-scope
>> >>>>> delimiters, PoU's, etc.
>> >>>>>
>> >>>>> The only reason I mention a debug protcol is that would allow this
>> GUI
>> >>>>> to be more easily written in something other that Java/Scala to take
>> >>>>> advantage of other GUI toolkits. It's been a long while since I've
>> done
>> >>>>> anything with Java guis, but they seems pretty poor that last I
>> looked
>> >>>>> at them. Would even allow for a TUI, which Java has little/no
>> support
>> >>>>> for. Also enables things like remote deubgging if an socket IPC was
>> >>>>> used. Though I'm not sure all of that is necessary. Just thinking
>> what
>> >>>>> would be ideal, and it can always be pared back.
>> >>>>>
>> >>>>>
>> >>>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
>> >>>>>> I don't think of it as a daffodil debug protocol, but just a
>> >>>>> separation of concerns between display of information and the
>> >> behaviors of
>> >>>>> parse/unparse that need to be points where users can pause, and data
>> >>>>> structures available to display.
>> >>>>>>
>> >>>>>> E.g., it is 100% a display issue that the infoset (shown as XML) is
>> >>>>> clumsy, too big, etc.  The infoset is available in the processor
>> >> state, and
>> >>>>> one can examine the current node, enclosing node, prior sibling(s),
>> >>>>> following sibling(s), etc. One can elide contents that are too big
>> for
>> >>>>> hexBinary, etc.
>> >>>>>>
>> >>>>>> I think this problem, how to display the infoset with sensible
>> limits
>> >>>>> on sizing, is fairly easy to come up with some design for, that
>> will at
>> >>>>> least be (1) always fairly small (2) much more useful in more
>> cases. It
>> >>>>> won't be perfect but can be much better than what we do now.
>> >>>>>>
>> >>>>>> One sensible display "mode" should be that displaying the context
>> >>>>> surrounding the current element (when parsing or unparsing)
>> displays at
>> >>>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L
>> >> characters
>> >>>>> (settable within reason ?)
>> >>>>>>
>> >>>>>> Sibling and enclosing nodes would be displayed eliding their
>> contents
>> >>>>> to at most 1 line.
>> >>>>>>
>> >>>>>> Here's an example of what I mean. Displaying up to M=10 lines
>> total:
>> >>>>>>
>> >>>>>> ...
>> >>>>>> <enclosingParent1>
>> >>>>>>    ...
>> >>>>>>    <priorSibling2>89ab782 ...</...>
>> >>>>>>    <priorSibling1>some text is here and some more text</...>
>> >>>>>>    <currentNode>value might be some big thing which needs to be
>> >> elided
>> >>>>> ...</...>
>> >>>>>>    <followingSibling1> ... </...>
>> >>>>>>    ???
>> >>>>>> </enclosingParent1>
>> >>>>>> ???
>> >>>>>>
>> >>>>>> The </...> is just an idea to reduce XML matching end-tag clutter.
>> >>>>>>
>> >>>>>> The ... on a line alone or where element content would appear
>> >>>>> generally means 1 or more other siblings. The way the display above
>> >> starts
>> >>>>> with ... means that this is a relative inner nest, not starting from
>> >> the
>> >>>>> absolute root.
>> >>>>>>
>> >>>>>> The ... within simple content means that content is elided to fit
>> on
>> >>>>> one line. Always follows some text characters to differentiate from
>> the
>> >>>>> child-element context.
>> >>>>>>
>> >>>>>> The ??? means zero or more other siblings.
>> >>>>>>
>> >>>>>> I used bold italic above to point out that the current node would
>> be
>> >>>>> highlighted somehow. Probably a way to do this that doesn't require
>> >> display
>> >>>>> modes would be useful. E.g., a text marker like ">>>" as in:
>> >>>>>>
>> >>>>>>>>> <currentNode>value .... </...>
>> >>>>>>
>> >>>>>> might be better, particularly for a trace output being dumped to a
>> >>>>> text file.
>> >>>>>>
>> >>>>>> I made the above example an unparser kind of example by showing a
>> >>>>> following sibling that exists that is after the current node.
>> >>>>>>
>> >>>>>> I think the key concept is that any sibling node is displayed in a
>> >> way
>> >>>>> that fits on one line.
>> >>>>>> E.g., even if the element name was really long, I'd suggest:
>> >>>>>>
>> >>>>>>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>> >>>>>>
>> >>>>>> Where the element name itself gets elided because it is too long.
>> >>>>>>
>> >>>>>> A thought. Note that the above presentation is shown as quasi-XML,
>> >> but
>> >>>>> there's nothing XML-specific about it. A JSON-friendly equivalent
>> >> could be
>> >>>>> done as well:
>> >>>>>>
>> >>>>>> enclosingParent1 = {
>> >>>>>>    ...
>> >>>>>>    priorSibling2 = "89ab782..."
>> >>>>>>    priorSibling1 = "some text is here and some more text"
>> >>>>>>    currentNode = "value might be some big thing which needs to be
>> >>>>> elided ..."
>> >>>>>>    followingSibling1 = { ... }
>> >>>>>>    ???
>> >>>>>> }
>> >>>>>>
>> >>>>>> That's enough for 1 email thread on this debug topic.
>> >>>>>>
>> >>>>>>
>> >>>>>> ________________________________
>> >>>>>> From: Steve Lawrence <sl...@apache.org>
>> >>>>>> Sent: Tuesday, January 5, 2021 2:26 PM
>> >>>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> >>>>>> Subject: The future of the daffodil DFDL schema debugger?
>> >>>>>>
>> >>>>>>
>> >>>>>> Now that we're in a new year, I'd like to start a discussion about
>> >> the
>> >>>>>> Daffodil DFDL Schema debugger and how it might be improved to be
>> more
>> >>>>>> useful.
>> >>>>>>
>> >>>>>> Note that this is not the capabilities to debug Daffodil itself in
>> >>>>>> something like Eclipse/IntelliJ, but the ability for Daffodil to
>> >>>>> provide
>> >>>>>> enough extra information during a parse/unparse so that a schema
>> >>>>>> developer can get an idea of what Daffodil is doing. This makes it
>> >>>>>> easier for users (rather than developers) to determine why a schema
>> >>>>>> isn't giving the expect parse/unparse result (either because of bad
>> >>>>> data
>> >>>>>> or a faulty schema.
>> >>>>>>
>> >>>>>> The current state of the debugger is enabled by providing the
>> --debug
>> >>>>> or
>> >>>>>> --trace flags in the CLI. More information about that here:
>> >>>>>>
>> >>>>>> https://daffodil.apache.org/debugger/
>> >>>>>>
>> >>>>>> This enables a TUI and commands somewhat similar to GDB, providing
>> >>>>> thins
>> >>>>>> like breakpoints, steps, displaying the current infoset, display a
>> >> dump
>> >>>>>> of the data, etc.
>> >>>>>>
>> >>>>>> Although I find this tool pretty useful, it definitely has some
>> >> glaring
>> >>>>>> issues.
>> >>>>>>
>> >>>>>> The most glaring to me is that it really isn't useful at all for
>> >>>>>> debugging unparse. The data dumps only include then main
>> >> outputstream,
>> >>>>>> so determine things like suspensions and buffered output is
>> >> impossible.
>> >>>>>>
>> >>>>>> Another issue is the infoset output. When outputting the infoset,
>> the
>> >>>>>> debugger currently just walks the entire thing and converts it to
>> XML
>> >>>>>> and displays the XML. For large infosets, this is excess and can
>> make
>> >>>>> it
>> >>>>>> impossible to use, even with some configurations the limit how much
>> >> of
>> >>>>>> that infoset is actually printed to the screen. Also things like
>> >> large
>> >>>>>> hex binary blobs create excessive and unusable output.
>> >>>>>>
>> >>>>>> Another thing I feel is missing is a schema view. Right now it's
>> very
>> >>>>>> difficult to know where in the schema Daffodil actually is.
>> >>>>>>
>> >>>>>> I think these issues just need some thought improvement. One could
>> >>>>>> imagine a better way to stringify our unparse buffers for debug.
>> One
>> >>>>>> could image a way to receive infoset state changes so the debugger
>> >> can
>> >>>>>> track things like backtracks and remove infosets. One could image a
>> >> way
>> >>>>>> display the schema
>> >>>>>>
>> >>>>>> We just need a better way to stringify the current state of the
>> >> unparse
>> >>>>>> data including buffers, and we need a way to for the debugger to
>> >>>>> receive
>> >>>>>> state change information about infoset so it can update displays
>> >> rather
>> >>>>>> than just constantly printing the entire infoset.
>> >>>>>>
>> >>>>>> However, I think another other big issue is just usability in
>> >> general.
>> >>>>> I
>> >>>>>> think the CLI usage is reasonable, but it's not always user
>> friendly,
>> >>>>>> and is difficult to view multiple things at the same time. I think
>> >>>>>> because of this very few people even use this tool. So this this
>> like
>> >>>>>> perhaps something worth focus.
>> >>>>>>
>> >>>>>> My first thought to improving this usability issue would be to
>> >>>>> implement
>> >>>>>> the Debug Adapter Protocol (DAP)
>> >>>>>> (https://microsoft.github.io/debug-adapter-protocol/) for
>> Daffodil,
>> >>>>>> which many IDE's implement. With this implemented, Daffodil could
>> be
>> >>>>>> plugged in to any IDE that supports it and essentially get
>> debugging
>> >>>>> for
>> >>>>>> free, without the need to worry about the GUI elements.
>> >>>>>>
>> >>>>>> I do have concerns that this just wouldn't have enough
>> functionality
>> >>>>>> that we'd really need. For example, DAP really only has ability
>> show
>> >>>>>> code (Daffodil's equivalent is the DFDL schema). There isn't a way
>> to
>> >>>>>> show a live view of the infoset or data. Most DAP IDE's do have a
>> >>>>>> console output, so we could potentially make it so the console
>> output
>> >>>>> is
>> >>>>>> a live view of infoset/data. But I'm not even sure most DAP
>> friendly
>> >>>>>> IDE's could support this kindof console output. Does anyone have
>> >>>>>> familiarity with DAP IDE's or and what kinds of console
>> capabilities
>> >>>>> are
>> >>>>>> available?
>> >>>>>>
>> >>>>>> I also looked into TUI libraries with the idea that we could just
>> >>>>> extend
>> >>>>>> our current debugger user interface to be a bit friendlier.
>> >>>>>> Unfortunately, there aren't too many Java/Scala TUI libraries and
>> >> those
>> >>>>>> that do exist don't have Apache friendly licenses. We also want to
>> be
>> >>>>>> careful about increase dependencies just for a debugger than many
>> >>>>> people
>> >>>>>> might not use, so large graphics libraries are probably out of the
>> >>>>> question.
>> >>>>>>
>> >>>>>> This allo makes me wonder if an approach worth taking for the
>> future
>> >> of
>> >>>>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
>> >>>>>> Protocol". I imagine it would be loosely based on DAP (which is
>> >>>>>> essentially JSON message based) but could be targeted to the things
>> >>>>> that
>> >>>>>> a DFDL schema debugger would really need. An added benefit with
>> some
>> >>>>>> sort of protocol is the debugger interface can be uncoupled from
>> >>>>>> Daffodil itself, so we could implement a TUI/GUI/whatever in any
>> >>>>>> language/GUI framework and just have it communicate the protocol
>> over
>> >>>>>> some form of IPC. Another benefit is that any future backends could
>> >>>>>> implement this protocol and so a single debugger could hook into
>> >>>>>> different backends without much issue. Unfortunately, defining
>> such a
>> >>>>>> protocol might be a large task, but we do have our existing debug
>> >>>>>> infrastructure and things like DAP to guide its development/design.
>> >>>>>>
>> >>>>>> Thoughts? Does such a Daffodil Debug Protocol seem worth it?
>> Perhaps
>> >> we
>> >>>>>> really just need the few improvements mentioned to the existing
>> >>>>>> debugger. Is that enough to make it usable? Or is an entirely
>> >> different
>> >>>>>> approach needed to debugging schemas?
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>
>> >
>>
>>

Re: The future of the daffodil DFDL schema debugger?

Posted by Adam Rosien <ad...@rosien.net>.
On Thu, Apr 22, 2021 at 7:03 AM Steve Lawrence <sl...@apache.org> wrote:

> Some thoughts related to showing the infoset as if it were a variable as
> this is prototyped
>
> 1) How do DAP/IDE's represent very large hierarchical data? Infosets can
> be huge, and most of the time a user only cares about the most recent
> infoset item. So someway to follow and show just the most recent part of
> the infoset is important. The current Daffodil debugger as an
> "infosetLines" setting so that it only shows the most recent X number of
> lines, which is most all a user cares about when stepping through a parse.
>

DAP Variables, if nested, can be lazily loaded with children offsets, etc.

> If the number of named or indexed children is large, the numbers should
be returned via the optional ‘namedVariables’ and ‘indexedVariables’
attributes.

 -
https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable

Or as the current behavior, only a window into the infoset could be
reported.


>
> 2) Infoset items are added and removed very frequently during a parse.
> Currently, when the Daffodil debugger shows the infoset it just converts
> the entire thing to XML and displays that. This doesn't work at all for
> large infosets since this can take a long time. I was hoping this issue
> would get resolved with this new debugging infrastructure. When the
> infoset is modified, we ideally want a way to specify via DAP that parts
> of the variable hierarchy were added/removed rather than having to send
> the entire infoset during every variable update.
>

As I understand it, DAP only requests the current state when the debugger
is stopped (due to a breakpoint, stepping, etc.):

> Whenever the program stops (on program entry, because a breakpoint was
hit, an exception occurred, or the user requested execution to be paused),
the debug adapter sends a stopped event with the appropriate reason and
thread id.
>
> Upon receipt, the development tool first requests the threads (see below)
and then the stacktrace (a list of stack frames) for the thread mentioned
in the stopped event. If the user then drills into the stack frame, the
development tool first requests the scopes for a stack frame, and then the
variables for a scope. If a variable is itself structured, the development
tool requests its properties through additional variables requests.

  - https://microsoft.github.io/debug-adapter-protocol/overview ("Stopping
and accessing debuggee state")

Large data like infosets could be lazily transferred, or some window into
it sent.


>
> 3) I can imagine a feature where a user would want to select an infoset
> item and jump to the associated schema element, or query information
> about that infoset item (e.g.. what bit position did it start at, what
> was the length). We don't have this right now, but would be really nice
> to have. This suggests that we need metadata associated with each of the
> variables. Does DAP have a concept of that and do IDE's have a way to
> show it?
>

From what I can tell, DAP doesn't cover any view-related interaction with
the debugger state. You can perform actions like "setVariable" if a user
wants to override a reported value (not sure we'd want this, but just
pointing it out), but there isn't a "jump to this resource at this
location" view command defined within DAP.

However, the VS Code extensions I previously mentioned *do* implement
similar functionality to "jump to this resource at this location". I
believe VS Code will react to debugger events, for example, when a
breakpoint is reached and the debuggee provides the current stacktrace, if
the user selects a particular stack frame, that frame has a reference to
the associated "source", which the UI can display. In the case of
stacktrace-as-schema-processing, each frame would correspond to the
location of the schema element, and the UI could focus on that location.


>
> On 4/21/21 7:52 PM, Adam Rosien wrote:
> > I've been reading up on DAP and wanted to share...
> >
> >> There are many areas though that are unique to Daffodil that have no
> > representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> > different variable types, backtracking, etc) will need an extension to
> > DAP.  This really boils down to defining these things to fit under the
> DAP
> > BaseProtocol and enabling handling of those objects on both the front and
> > back ends.
> >
> > To me, much of the current state exposed by the (Daffodil) Debugger
> > translates directly to a DAP Variable[1]. DAP Variables can be
> > nested/hierarchical, so they could (potentially) model larger data like
> the
> > infoset. I can imagine shoving all the current state into Variables as a
> > proof-of-concept.
> >
> > It also seems like the processing stack maintained by the Daffodil
> PState,
> > where each item references the relevant schema element, could translate
> to
> > the DAP StackFrame type [2]. That is, the path from the schema root to
> the
> > currently processing schema element becomes the "call stack". (Apologies
> if
> > I don't have all the Daffodil terms lined up correctly.)
> >
> > For displaying the input data and processing progress, I looked at a few
> > existing VS Code extensions that provided non-builtin views, some of
> which
> > interact with their DAP debugger code [3] [4] [5] [6].
> >
> > Finally, I took a cursory look at scala-debug-adapter [7], which, for
> > reference, wraps Microsoft's java-debug implementation of DAP. I was
> > curious about the set of request/response and event types. Additionally,
> > the Typescript API to VS Code offers custom DAP requests and responses,
> but
> > I couldn't find the equivalent notion in the java-debug project.
> >
> > .. Adam
> >
> > [1]
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
> > [2]
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_StackFrame
> > [3] https://github.com/scalameta/metals-vscode (provides a debugger and
> > non-debugger custom UI)
> > [4] https://github.com/microsoft/vscode-cpptools (debugger + memory
> view)
> > [5]
> https://marketplace.visualstudio.com/items?itemName=marus25.cortex-debug
> > (debugger + memory view,
> >
> https://github.com/Marus/cortex-debug/blob/master/src/frontend/memory_content_provider.ts
> > )
> > [6]
> >
> https://marketplace.visualstudio.com/items?itemName=slevesque.vscode-hexdump
> > (extension for hexdumps that could be controlled by other extensions)
> > [7] https://github.com/scalacenter/scala-debug-adapter
> > [8] https://github.com/microsoft/java-debug
> >
> > On Tue, Apr 20, 2021 at 7:08 AM John Wass <jw...@gmail.com> wrote:
> >
> >>> Going to look deeper into how DAP might fit with Daffodil
> >>
> >> Have been looking over DAP and getting a good feeling about it. The
> >> specification [1] seems general enough that it could be applied to
> Daffodil
> >> and cover a swath of common operations (like start, stop, break,
> continue,
> >> code locations, variables, etc).
> >>
> >> There are many areas though that are unique to Daffodil that have no
> >> representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> >> different variable types, backtracking, etc) will need an extension to
> >> DAP.  This really boils down to defining these things to fit under the
> DAP
> >> BaseProtocol and enabling handling of those objects on both the front
> and
> >> back ends.
> >>
> >> On the backend we need a Daffodil DAP protocol server.  Existing JVM
> >> implementations (like Java [2], Scala [3]) are tied closely to JDI and
> >> would bring a lot of extra baggage to work around that.  Developing a
> >> Daffodil specific implementation is no small task, but feasible.  There
> are
> >> a several existing implementations on the JVM that are close and can be
> >> looked at for reference.
> >>
> >> The backend implementation would look similar to what was described in
> an
> >> earlier post.  We could use ZIO/Akka/etc to implement the backend
> Protocol
> >> Server to enable the IO between the Daffodil process and the DAP
> clients.
> >> This implementation would now be guided by the DAP specification.
> >>
> >> With the protocol and backend extended to fit Daffodil that leaves the
> >> frontend.  In theory an existing IDE plugin should get pretty close to
> >> being able to perform the common debug operations mentioned above.  To
> >> support the Daffodil extensions there will need to be handling of the
> >> extended protocol into whatever views are desired/applicable.
> >>
> >>> Also looking into the Java Debug Interface (JDI) for comparison.
> >>
> >> JDI appears to be the wrong level of abstraction for what we are talking
> >> about in debugging Daffodil for schema development.  While DAP does do
> JVM
> >> debugging (through a JDI DAP impl) it also generalizes to many other
> >> debugging scenarios.  JDI on the other hand is very tied to the JVM.
> >>
> >> Extending the JDI appears to be more complex than dealing with DAP, and
> >> even though the JDI API is mostly defined with interfaces, there are
> choke
> >> points that limit to JVM concepts.  For example jdi.Value has a finite
> set
> >> of JVM types that it works with, its not clear where Daffodil types
> would
> >> plugin if even possible.
> >>
> >> The final note is that unique Daffodil features wouldn’t get to IDE
> support
> >> any faster JDI.  In some cases, like VS Code, you would still need an
> >> extended DAP to support these features.
> >>
> >>> and depending on how it shakes out will update the example to show
> >> integration
> >>
> >> It would appear wise to investigate DAP further.  Next step is to refine
> >> these thoughts with a prototype. I started an implementation in the
> example
> >> debugger project [4] to try to run the current example on a _minimal_
> DAP
> >> implementation.
> >>
> >>
> >> [1] https://microsoft.github.io/debug-adapter-protocol/specification
> >> [2] https://github.com/Microsoft/java-debug
> >> [3] https://github.com/scalacenter/scala-debug-adapter
> >> [4] https://github.com/jw3/example-daffodil-debug
> >>
> >>
> >> On Mon, Apr 12, 2021 at 9:58 AM John Wass <jw...@gmail.com> wrote:
> >>
> >>>> the code is here https://github.com/jw3/example-daffodil-debug
> >>>
> >>> There is now a complete console based example for Zio that demonstrates
> >>> controlling the debug flow while distributing the current state to
> three
> >>> "displays".
> >>> 1. infoset at current step
> >>> 2. diff of infoset against previous step
> >>> 3. bit position and value of data.
> >>>
> >>> These displays are very rudimentary but demonstrate the ability to
> >>> asynchronously populate multiple views while synchronously controlling
> >> the
> >>> debug loop.
> >>>
> >>>> - The new protocol being informed by existing debugger and DAPis key
> >>>
> >>> Going to look deeper into how DAP might fit with Daffodil, and
> depending
> >>> on how it shakes out will update the example to show integration.
> >>>
> >>> Some interesting links to start with
> >>> - https://github.com/scalacenter/scala-debug-adapter
> >>> -
> >>>
> >>
> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
> >>> - https://github.com/microsoft/java-debug
> >>>
> >>> Also looking into the Java Debug Interface (JDI) for comparison.
> >>>
> >>>
> >>> On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:
> >>>
> >>>> Revisiting this post after doing some debugger related work and
> thinking
> >>>> about debug protocol/adapters to connect external tooling to the debug
> >>>> process.
> >>>>
> >>>> This comment is good
> >>>>
> >>>>> This allo makes me wonder if an approach worth taking for the future
> >> of
> >>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
> >>>> Protocol". I imagine it would be loosely based on DAP (which is
> >>>> essentially JSON message based) but could be targeted to the things
> >> that a
> >>>> DFDL schema debugger would really need. An added benefit with some
> >> sort of
> >>>> protocol is the debugger interface can be uncoupled from Daffodil
> >>>> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
> >>>> framework and just have it communicate the protocol over some form of
> >>>> IPC. Another benefit is that any future backends could implement this
> >>>> protocol and so a single debugger could hook into different backends
> >>>> without much issue. Unfortunately, defining such a protocol might be a
> >>>> large task, but we do have our existing debug infrastructure and
> things
> >>>> like DAP to guide its development/design.
> >>>>
> >>>> Some thoughts on this
> >>>> - Defining the protocol will be a large task, but a minimal version
> >>>> should get up and round tripping quickly with a minimal subset of the
> >>>> protocol.
> >>>> - The new protocol being informed by existing debugger and DAPis key
> >>>> - Uncoupling from Daffodil is key
> >>>> - Adapt the Daffodil protocol to produce DAP after the fact so as not
> to
> >>>> constrain Daffodil debugging capability
> >>>> - We dont need to tie the protocol or adapters to a single framework,
> >>>> implementations of the IO layer should be simple enough to support
> >> multiple
> >>>> things (eg Akka, Zio, "basic" ...)
> >>>> - The current debugger lives in runtime1, but can we make an abstract
> >> API
> >>>> that any runtime would implement?
> >>>>
> >>>> Maybe a solution is structured like this
> >>>> - daffodil-debug-api:
> >>>>   - protocol model
> >>>>   - interfaces: debugger / IO adapter / etc
> >>>>   - lives in daffodil repo (new subproject?)
> >>>> - daffodil-debug-io-NAME
> >>>>   - provides implementation of a specific IO adapter
> >>>>   - multiple projects possible (daffodil-debugger-akka,
> >>>> daffodil-debugger-zio, etc)
> >>>>   - supported ones live in their own subprojects, but other can be
> >>>> plugged in from external sources
> >>>>   - ability to support multiple implementations reduces risk of
> lock-in
> >>>> - debugger applications
> >>>>   - maintained in external repositories
> >>>>   - depending on the IO implementation these could execute be in
> >> separate
> >>>> process or on separate machine
> >>>>   - like Steve said, could be any language / framework
> >>>>
> >>>> Three types of reference implementations / sample applications could
> >> also
> >>>> guide the development of the API
> >>>>   1. a replacement for the existing TUI debugger, expected to end up
> >> with
> >>>> at minimum the same functionality as the current one.
> >>>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
> >>>>   3. an IDE integration
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Also I'm working on some reference implementations of these concepts
> >>>> using Akka and Zio.  Not quite ready to talk through it yet, but the
> >> code
> >>>> is here https://github.com/jw3/example-daffodil-debug
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Yep, something like that seems very reasonable for dealing with large
> >>>>> infosets. But it still feels like we still run into usability issues.
> >>>>> For example, what if a user wants to see more? We need some
> >>>>> configuration options to increase what we've ellided. It's not big,
> but
> >>>>> every new thing that needs configuration adds complexity and
> decreases
> >>>>> usability.
> >>>>>
> >>>>> And I think the only reason we are trying to spend effort elliding
> >>>>> things is because we're limited to this gdb-like interface where you
> >> can
> >>>>> only print out a little information at a time.
> >>>>>
> >>>>> I think what would really is to dump this gdb interface and instead
> use
> >>>>> multiple windows/views. As a really close example to what I imagine,
> I
> >>>>> recently came across this hex editor:
> >>>>>
> >>>>> https://www.synalysis.net/
> >>>>>
> >>>>> The screenshots are a bit small so it's not super clear, but this
> tool
> >>>>> has one view for the data in hex, and one view for a tree of parsed
> >>>>> results (which is very similar to our infoset). The "infoset" view
> has
> >>>>> information like offset/length/value, and can be related back to the
> >>>>> data view to find the actual bits.
> >>>>>
> >>>>> I imagine the "next generation daffodil debugger" to look much like
> >>>>> this. As data is parsed, the infoset view fills up. This view could
> act
> >>>>> like a standard GUI tree so you could collapse sections or scroll
> >> around
> >>>>> to show just the parts you care about, and have search capabilities
> to
> >>>>> quickly jump around. The advantage here is you no longer really need
> >>>>> automated eliding or heuristics for what the user *might* care about.
> >>>>> You just show the whole thing and let user scroll around. As daffodil
> >>>>> parses and backtracks, this tree grows or shrinks.
> >>>>>
> >>>>> I also imagine you could have a cursor moving around the hex view, so
> >> as
> >>>>> daffodil moves around (e.g. scanning for delimiters, extracting
> >>>>> integers), one could update this data view to show what daffodil is
> >>>>> doing and where it is.
> >>>>>
> >>>>> I also image there could be other views as well. For example, a
> schema
> >>>>> view to show where in the schema daffodil is, and to add/remove
> >>>>> breakpoints. And an information view for things like variables,
> >> in-scope
> >>>>> delimiters, PoU's, etc.
> >>>>>
> >>>>> The only reason I mention a debug protcol is that would allow this
> GUI
> >>>>> to be more easily written in something other that Java/Scala to take
> >>>>> advantage of other GUI toolkits. It's been a long while since I've
> done
> >>>>> anything with Java guis, but they seems pretty poor that last I
> looked
> >>>>> at them. Would even allow for a TUI, which Java has little/no support
> >>>>> for. Also enables things like remote deubgging if an socket IPC was
> >>>>> used. Though I'm not sure all of that is necessary. Just thinking
> what
> >>>>> would be ideal, and it can always be pared back.
> >>>>>
> >>>>>
> >>>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> >>>>>> I don't think of it as a daffodil debug protocol, but just a
> >>>>> separation of concerns between display of information and the
> >> behaviors of
> >>>>> parse/unparse that need to be points where users can pause, and data
> >>>>> structures available to display.
> >>>>>>
> >>>>>> E.g., it is 100% a display issue that the infoset (shown as XML) is
> >>>>> clumsy, too big, etc.  The infoset is available in the processor
> >> state, and
> >>>>> one can examine the current node, enclosing node, prior sibling(s),
> >>>>> following sibling(s), etc. One can elide contents that are too big
> for
> >>>>> hexBinary, etc.
> >>>>>>
> >>>>>> I think this problem, how to display the infoset with sensible
> limits
> >>>>> on sizing, is fairly easy to come up with some design for, that will
> at
> >>>>> least be (1) always fairly small (2) much more useful in more cases.
> It
> >>>>> won't be perfect but can be much better than what we do now.
> >>>>>>
> >>>>>> One sensible display "mode" should be that displaying the context
> >>>>> surrounding the current element (when parsing or unparsing) displays
> at
> >>>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L
> >> characters
> >>>>> (settable within reason ?)
> >>>>>>
> >>>>>> Sibling and enclosing nodes would be displayed eliding their
> contents
> >>>>> to at most 1 line.
> >>>>>>
> >>>>>> Here's an example of what I mean. Displaying up to M=10 lines total:
> >>>>>>
> >>>>>> ...
> >>>>>> <enclosingParent1>
> >>>>>>    ...
> >>>>>>    <priorSibling2>89ab782 ...</...>
> >>>>>>    <priorSibling1>some text is here and some more text</...>
> >>>>>>    <currentNode>value might be some big thing which needs to be
> >> elided
> >>>>> ...</...>
> >>>>>>    <followingSibling1> ... </...>
> >>>>>>    ???
> >>>>>> </enclosingParent1>
> >>>>>> ???
> >>>>>>
> >>>>>> The </...> is just an idea to reduce XML matching end-tag clutter.
> >>>>>>
> >>>>>> The ... on a line alone or where element content would appear
> >>>>> generally means 1 or more other siblings. The way the display above
> >> starts
> >>>>> with ... means that this is a relative inner nest, not starting from
> >> the
> >>>>> absolute root.
> >>>>>>
> >>>>>> The ... within simple content means that content is elided to fit on
> >>>>> one line. Always follows some text characters to differentiate from
> the
> >>>>> child-element context.
> >>>>>>
> >>>>>> The ??? means zero or more other siblings.
> >>>>>>
> >>>>>> I used bold italic above to point out that the current node would be
> >>>>> highlighted somehow. Probably a way to do this that doesn't require
> >> display
> >>>>> modes would be useful. E.g., a text marker like ">>>" as in:
> >>>>>>
> >>>>>>>>> <currentNode>value .... </...>
> >>>>>>
> >>>>>> might be better, particularly for a trace output being dumped to a
> >>>>> text file.
> >>>>>>
> >>>>>> I made the above example an unparser kind of example by showing a
> >>>>> following sibling that exists that is after the current node.
> >>>>>>
> >>>>>> I think the key concept is that any sibling node is displayed in a
> >> way
> >>>>> that fits on one line.
> >>>>>> E.g., even if the element name was really long, I'd suggest:
> >>>>>>
> >>>>>>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> >>>>>>
> >>>>>> Where the element name itself gets elided because it is too long.
> >>>>>>
> >>>>>> A thought. Note that the above presentation is shown as quasi-XML,
> >> but
> >>>>> there's nothing XML-specific about it. A JSON-friendly equivalent
> >> could be
> >>>>> done as well:
> >>>>>>
> >>>>>> enclosingParent1 = {
> >>>>>>    ...
> >>>>>>    priorSibling2 = "89ab782..."
> >>>>>>    priorSibling1 = "some text is here and some more text"
> >>>>>>    currentNode = "value might be some big thing which needs to be
> >>>>> elided ..."
> >>>>>>    followingSibling1 = { ... }
> >>>>>>    ???
> >>>>>> }
> >>>>>>
> >>>>>> That's enough for 1 email thread on this debug topic.
> >>>>>>
> >>>>>>
> >>>>>> ________________________________
> >>>>>> From: Steve Lawrence <sl...@apache.org>
> >>>>>> Sent: Tuesday, January 5, 2021 2:26 PM
> >>>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> >>>>>> Subject: The future of the daffodil DFDL schema debugger?
> >>>>>>
> >>>>>>
> >>>>>> Now that we're in a new year, I'd like to start a discussion about
> >> the
> >>>>>> Daffodil DFDL Schema debugger and how it might be improved to be
> more
> >>>>>> useful.
> >>>>>>
> >>>>>> Note that this is not the capabilities to debug Daffodil itself in
> >>>>>> something like Eclipse/IntelliJ, but the ability for Daffodil to
> >>>>> provide
> >>>>>> enough extra information during a parse/unparse so that a schema
> >>>>>> developer can get an idea of what Daffodil is doing. This makes it
> >>>>>> easier for users (rather than developers) to determine why a schema
> >>>>>> isn't giving the expect parse/unparse result (either because of bad
> >>>>> data
> >>>>>> or a faulty schema.
> >>>>>>
> >>>>>> The current state of the debugger is enabled by providing the
> --debug
> >>>>> or
> >>>>>> --trace flags in the CLI. More information about that here:
> >>>>>>
> >>>>>> https://daffodil.apache.org/debugger/
> >>>>>>
> >>>>>> This enables a TUI and commands somewhat similar to GDB, providing
> >>>>> thins
> >>>>>> like breakpoints, steps, displaying the current infoset, display a
> >> dump
> >>>>>> of the data, etc.
> >>>>>>
> >>>>>> Although I find this tool pretty useful, it definitely has some
> >> glaring
> >>>>>> issues.
> >>>>>>
> >>>>>> The most glaring to me is that it really isn't useful at all for
> >>>>>> debugging unparse. The data dumps only include then main
> >> outputstream,
> >>>>>> so determine things like suspensions and buffered output is
> >> impossible.
> >>>>>>
> >>>>>> Another issue is the infoset output. When outputting the infoset,
> the
> >>>>>> debugger currently just walks the entire thing and converts it to
> XML
> >>>>>> and displays the XML. For large infosets, this is excess and can
> make
> >>>>> it
> >>>>>> impossible to use, even with some configurations the limit how much
> >> of
> >>>>>> that infoset is actually printed to the screen. Also things like
> >> large
> >>>>>> hex binary blobs create excessive and unusable output.
> >>>>>>
> >>>>>> Another thing I feel is missing is a schema view. Right now it's
> very
> >>>>>> difficult to know where in the schema Daffodil actually is.
> >>>>>>
> >>>>>> I think these issues just need some thought improvement. One could
> >>>>>> imagine a better way to stringify our unparse buffers for debug. One
> >>>>>> could image a way to receive infoset state changes so the debugger
> >> can
> >>>>>> track things like backtracks and remove infosets. One could image a
> >> way
> >>>>>> display the schema
> >>>>>>
> >>>>>> We just need a better way to stringify the current state of the
> >> unparse
> >>>>>> data including buffers, and we need a way to for the debugger to
> >>>>> receive
> >>>>>> state change information about infoset so it can update displays
> >> rather
> >>>>>> than just constantly printing the entire infoset.
> >>>>>>
> >>>>>> However, I think another other big issue is just usability in
> >> general.
> >>>>> I
> >>>>>> think the CLI usage is reasonable, but it's not always user
> friendly,
> >>>>>> and is difficult to view multiple things at the same time. I think
> >>>>>> because of this very few people even use this tool. So this this
> like
> >>>>>> perhaps something worth focus.
> >>>>>>
> >>>>>> My first thought to improving this usability issue would be to
> >>>>> implement
> >>>>>> the Debug Adapter Protocol (DAP)
> >>>>>> (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
> >>>>>> which many IDE's implement. With this implemented, Daffodil could be
> >>>>>> plugged in to any IDE that supports it and essentially get debugging
> >>>>> for
> >>>>>> free, without the need to worry about the GUI elements.
> >>>>>>
> >>>>>> I do have concerns that this just wouldn't have enough functionality
> >>>>>> that we'd really need. For example, DAP really only has ability show
> >>>>>> code (Daffodil's equivalent is the DFDL schema). There isn't a way
> to
> >>>>>> show a live view of the infoset or data. Most DAP IDE's do have a
> >>>>>> console output, so we could potentially make it so the console
> output
> >>>>> is
> >>>>>> a live view of infoset/data. But I'm not even sure most DAP friendly
> >>>>>> IDE's could support this kindof console output. Does anyone have
> >>>>>> familiarity with DAP IDE's or and what kinds of console capabilities
> >>>>> are
> >>>>>> available?
> >>>>>>
> >>>>>> I also looked into TUI libraries with the idea that we could just
> >>>>> extend
> >>>>>> our current debugger user interface to be a bit friendlier.
> >>>>>> Unfortunately, there aren't too many Java/Scala TUI libraries and
> >> those
> >>>>>> that do exist don't have Apache friendly licenses. We also want to
> be
> >>>>>> careful about increase dependencies just for a debugger than many
> >>>>> people
> >>>>>> might not use, so large graphics libraries are probably out of the
> >>>>> question.
> >>>>>>
> >>>>>> This allo makes me wonder if an approach worth taking for the future
> >> of
> >>>>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
> >>>>>> Protocol". I imagine it would be loosely based on DAP (which is
> >>>>>> essentially JSON message based) but could be targeted to the things
> >>>>> that
> >>>>>> a DFDL schema debugger would really need. An added benefit with some
> >>>>>> sort of protocol is the debugger interface can be uncoupled from
> >>>>>> Daffodil itself, so we could implement a TUI/GUI/whatever in any
> >>>>>> language/GUI framework and just have it communicate the protocol
> over
> >>>>>> some form of IPC. Another benefit is that any future backends could
> >>>>>> implement this protocol and so a single debugger could hook into
> >>>>>> different backends without much issue. Unfortunately, defining such
> a
> >>>>>> protocol might be a large task, but we do have our existing debug
> >>>>>> infrastructure and things like DAP to guide its development/design.
> >>>>>>
> >>>>>> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> >> we
> >>>>>> really just need the few improvements mentioned to the existing
> >>>>>> debugger. Is that enough to make it usable? Or is an entirely
> >> different
> >>>>>> approach needed to debugging schemas?
> >>>>>>
> >>>>>
> >>>>>
> >>
> >
>
>

Re: The future of the daffodil DFDL schema debugger?

Posted by Steve Lawrence <sl...@apache.org>.
Some thoughts related to showing the infoset as if it were a variable as
this is prototyped

1) How do DAP/IDE's represent very large hierarchical data? Infosets can
be huge, and most of the time a user only cares about the most recent
infoset item. So someway to follow and show just the most recent part of
the infoset is important. The current Daffodil debugger as an
"infosetLines" setting so that it only shows the most recent X number of
lines, which is most all a user cares about when stepping through a parse.

2) Infoset items are added and removed very frequently during a parse.
Currently, when the Daffodil debugger shows the infoset it just converts
the entire thing to XML and displays that. This doesn't work at all for
large infosets since this can take a long time. I was hoping this issue
would get resolved with this new debugging infrastructure. When the
infoset is modified, we ideally want a way to specify via DAP that parts
of the variable hierarchy were added/removed rather than having to send
the entire infoset during every variable update.

3) I can imagine a feature where a user would want to select an infoset
item and jump to the associated schema element, or query information
about that infoset item (e.g.. what bit position did it start at, what
was the length). We don't have this right now, but would be really nice
to have. This suggests that we need metadata associated with each of the
variables. Does DAP have a concept of that and do IDE's have a way to
show it?

On 4/21/21 7:52 PM, Adam Rosien wrote:
> I've been reading up on DAP and wanted to share...
> 
>> There are many areas though that are unique to Daffodil that have no
> representation in the spec.  These things (like InputStream, Infoset, PoU,
> different variable types, backtracking, etc) will need an extension to
> DAP.  This really boils down to defining these things to fit under the DAP
> BaseProtocol and enabling handling of those objects on both the front and
> back ends.
> 
> To me, much of the current state exposed by the (Daffodil) Debugger
> translates directly to a DAP Variable[1]. DAP Variables can be
> nested/hierarchical, so they could (potentially) model larger data like the
> infoset. I can imagine shoving all the current state into Variables as a
> proof-of-concept.
> 
> It also seems like the processing stack maintained by the Daffodil PState,
> where each item references the relevant schema element, could translate to
> the DAP StackFrame type [2]. That is, the path from the schema root to the
> currently processing schema element becomes the "call stack". (Apologies if
> I don't have all the Daffodil terms lined up correctly.)
> 
> For displaying the input data and processing progress, I looked at a few
> existing VS Code extensions that provided non-builtin views, some of which
> interact with their DAP debugger code [3] [4] [5] [6].
> 
> Finally, I took a cursory look at scala-debug-adapter [7], which, for
> reference, wraps Microsoft's java-debug implementation of DAP. I was
> curious about the set of request/response and event types. Additionally,
> the Typescript API to VS Code offers custom DAP requests and responses, but
> I couldn't find the equivalent notion in the java-debug project.
> 
> .. Adam
> 
> [1]
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
> [2]
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_StackFrame
> [3] https://github.com/scalameta/metals-vscode (provides a debugger and
> non-debugger custom UI)
> [4] https://github.com/microsoft/vscode-cpptools (debugger + memory view)
> [5] https://marketplace.visualstudio.com/items?itemName=marus25.cortex-debug
> (debugger + memory view,
> https://github.com/Marus/cortex-debug/blob/master/src/frontend/memory_content_provider.ts
> )
> [6]
> https://marketplace.visualstudio.com/items?itemName=slevesque.vscode-hexdump
> (extension for hexdumps that could be controlled by other extensions)
> [7] https://github.com/scalacenter/scala-debug-adapter
> [8] https://github.com/microsoft/java-debug
> 
> On Tue, Apr 20, 2021 at 7:08 AM John Wass <jw...@gmail.com> wrote:
> 
>>> Going to look deeper into how DAP might fit with Daffodil
>>
>> Have been looking over DAP and getting a good feeling about it. The
>> specification [1] seems general enough that it could be applied to Daffodil
>> and cover a swath of common operations (like start, stop, break, continue,
>> code locations, variables, etc).
>>
>> There are many areas though that are unique to Daffodil that have no
>> representation in the spec.  These things (like InputStream, Infoset, PoU,
>> different variable types, backtracking, etc) will need an extension to
>> DAP.  This really boils down to defining these things to fit under the DAP
>> BaseProtocol and enabling handling of those objects on both the front and
>> back ends.
>>
>> On the backend we need a Daffodil DAP protocol server.  Existing JVM
>> implementations (like Java [2], Scala [3]) are tied closely to JDI and
>> would bring a lot of extra baggage to work around that.  Developing a
>> Daffodil specific implementation is no small task, but feasible.  There are
>> a several existing implementations on the JVM that are close and can be
>> looked at for reference.
>>
>> The backend implementation would look similar to what was described in an
>> earlier post.  We could use ZIO/Akka/etc to implement the backend Protocol
>> Server to enable the IO between the Daffodil process and the DAP clients.
>> This implementation would now be guided by the DAP specification.
>>
>> With the protocol and backend extended to fit Daffodil that leaves the
>> frontend.  In theory an existing IDE plugin should get pretty close to
>> being able to perform the common debug operations mentioned above.  To
>> support the Daffodil extensions there will need to be handling of the
>> extended protocol into whatever views are desired/applicable.
>>
>>> Also looking into the Java Debug Interface (JDI) for comparison.
>>
>> JDI appears to be the wrong level of abstraction for what we are talking
>> about in debugging Daffodil for schema development.  While DAP does do JVM
>> debugging (through a JDI DAP impl) it also generalizes to many other
>> debugging scenarios.  JDI on the other hand is very tied to the JVM.
>>
>> Extending the JDI appears to be more complex than dealing with DAP, and
>> even though the JDI API is mostly defined with interfaces, there are choke
>> points that limit to JVM concepts.  For example jdi.Value has a finite set
>> of JVM types that it works with, its not clear where Daffodil types would
>> plugin if even possible.
>>
>> The final note is that unique Daffodil features wouldn’t get to IDE support
>> any faster JDI.  In some cases, like VS Code, you would still need an
>> extended DAP to support these features.
>>
>>> and depending on how it shakes out will update the example to show
>> integration
>>
>> It would appear wise to investigate DAP further.  Next step is to refine
>> these thoughts with a prototype. I started an implementation in the example
>> debugger project [4] to try to run the current example on a _minimal_ DAP
>> implementation.
>>
>>
>> [1] https://microsoft.github.io/debug-adapter-protocol/specification
>> [2] https://github.com/Microsoft/java-debug
>> [3] https://github.com/scalacenter/scala-debug-adapter
>> [4] https://github.com/jw3/example-daffodil-debug
>>
>>
>> On Mon, Apr 12, 2021 at 9:58 AM John Wass <jw...@gmail.com> wrote:
>>
>>>> the code is here https://github.com/jw3/example-daffodil-debug
>>>
>>> There is now a complete console based example for Zio that demonstrates
>>> controlling the debug flow while distributing the current state to three
>>> "displays".
>>> 1. infoset at current step
>>> 2. diff of infoset against previous step
>>> 3. bit position and value of data.
>>>
>>> These displays are very rudimentary but demonstrate the ability to
>>> asynchronously populate multiple views while synchronously controlling
>> the
>>> debug loop.
>>>
>>>> - The new protocol being informed by existing debugger and DAPis key
>>>
>>> Going to look deeper into how DAP might fit with Daffodil, and depending
>>> on how it shakes out will update the example to show integration.
>>>
>>> Some interesting links to start with
>>> - https://github.com/scalacenter/scala-debug-adapter
>>> -
>>>
>> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
>>> - https://github.com/microsoft/java-debug
>>>
>>> Also looking into the Java Debug Interface (JDI) for comparison.
>>>
>>>
>>> On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:
>>>
>>>> Revisiting this post after doing some debugger related work and thinking
>>>> about debug protocol/adapters to connect external tooling to the debug
>>>> process.
>>>>
>>>> This comment is good
>>>>
>>>>> This allo makes me wonder if an approach worth taking for the future
>> of
>>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
>>>> Protocol". I imagine it would be loosely based on DAP (which is
>>>> essentially JSON message based) but could be targeted to the things
>> that a
>>>> DFDL schema debugger would really need. An added benefit with some
>> sort of
>>>> protocol is the debugger interface can be uncoupled from Daffodil
>>>> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
>>>> framework and just have it communicate the protocol over some form of
>>>> IPC. Another benefit is that any future backends could implement this
>>>> protocol and so a single debugger could hook into different backends
>>>> without much issue. Unfortunately, defining such a protocol might be a
>>>> large task, but we do have our existing debug infrastructure and things
>>>> like DAP to guide its development/design.
>>>>
>>>> Some thoughts on this
>>>> - Defining the protocol will be a large task, but a minimal version
>>>> should get up and round tripping quickly with a minimal subset of the
>>>> protocol.
>>>> - The new protocol being informed by existing debugger and DAPis key
>>>> - Uncoupling from Daffodil is key
>>>> - Adapt the Daffodil protocol to produce DAP after the fact so as not to
>>>> constrain Daffodil debugging capability
>>>> - We dont need to tie the protocol or adapters to a single framework,
>>>> implementations of the IO layer should be simple enough to support
>> multiple
>>>> things (eg Akka, Zio, "basic" ...)
>>>> - The current debugger lives in runtime1, but can we make an abstract
>> API
>>>> that any runtime would implement?
>>>>
>>>> Maybe a solution is structured like this
>>>> - daffodil-debug-api:
>>>>   - protocol model
>>>>   - interfaces: debugger / IO adapter / etc
>>>>   - lives in daffodil repo (new subproject?)
>>>> - daffodil-debug-io-NAME
>>>>   - provides implementation of a specific IO adapter
>>>>   - multiple projects possible (daffodil-debugger-akka,
>>>> daffodil-debugger-zio, etc)
>>>>   - supported ones live in their own subprojects, but other can be
>>>> plugged in from external sources
>>>>   - ability to support multiple implementations reduces risk of lock-in
>>>> - debugger applications
>>>>   - maintained in external repositories
>>>>   - depending on the IO implementation these could execute be in
>> separate
>>>> process or on separate machine
>>>>   - like Steve said, could be any language / framework
>>>>
>>>> Three types of reference implementations / sample applications could
>> also
>>>> guide the development of the API
>>>>   1. a replacement for the existing TUI debugger, expected to end up
>> with
>>>> at minimum the same functionality as the current one.
>>>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
>>>>   3. an IDE integration
>>>>
>>>> Thoughts?
>>>>
>>>> Also I'm working on some reference implementations of these concepts
>>>> using Akka and Zio.  Not quite ready to talk through it yet, but the
>> code
>>>> is here https://github.com/jw3/example-daffodil-debug
>>>>
>>>>
>>>>
>>>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
>>>> wrote:
>>>>
>>>>> Yep, something like that seems very reasonable for dealing with large
>>>>> infosets. But it still feels like we still run into usability issues.
>>>>> For example, what if a user wants to see more? We need some
>>>>> configuration options to increase what we've ellided. It's not big, but
>>>>> every new thing that needs configuration adds complexity and decreases
>>>>> usability.
>>>>>
>>>>> And I think the only reason we are trying to spend effort elliding
>>>>> things is because we're limited to this gdb-like interface where you
>> can
>>>>> only print out a little information at a time.
>>>>>
>>>>> I think what would really is to dump this gdb interface and instead use
>>>>> multiple windows/views. As a really close example to what I imagine, I
>>>>> recently came across this hex editor:
>>>>>
>>>>> https://www.synalysis.net/
>>>>>
>>>>> The screenshots are a bit small so it's not super clear, but this tool
>>>>> has one view for the data in hex, and one view for a tree of parsed
>>>>> results (which is very similar to our infoset). The "infoset" view has
>>>>> information like offset/length/value, and can be related back to the
>>>>> data view to find the actual bits.
>>>>>
>>>>> I imagine the "next generation daffodil debugger" to look much like
>>>>> this. As data is parsed, the infoset view fills up. This view could act
>>>>> like a standard GUI tree so you could collapse sections or scroll
>> around
>>>>> to show just the parts you care about, and have search capabilities to
>>>>> quickly jump around. The advantage here is you no longer really need
>>>>> automated eliding or heuristics for what the user *might* care about.
>>>>> You just show the whole thing and let user scroll around. As daffodil
>>>>> parses and backtracks, this tree grows or shrinks.
>>>>>
>>>>> I also imagine you could have a cursor moving around the hex view, so
>> as
>>>>> daffodil moves around (e.g. scanning for delimiters, extracting
>>>>> integers), one could update this data view to show what daffodil is
>>>>> doing and where it is.
>>>>>
>>>>> I also image there could be other views as well. For example, a schema
>>>>> view to show where in the schema daffodil is, and to add/remove
>>>>> breakpoints. And an information view for things like variables,
>> in-scope
>>>>> delimiters, PoU's, etc.
>>>>>
>>>>> The only reason I mention a debug protcol is that would allow this GUI
>>>>> to be more easily written in something other that Java/Scala to take
>>>>> advantage of other GUI toolkits. It's been a long while since I've done
>>>>> anything with Java guis, but they seems pretty poor that last I looked
>>>>> at them. Would even allow for a TUI, which Java has little/no support
>>>>> for. Also enables things like remote deubgging if an socket IPC was
>>>>> used. Though I'm not sure all of that is necessary. Just thinking what
>>>>> would be ideal, and it can always be pared back.
>>>>>
>>>>>
>>>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
>>>>>> I don't think of it as a daffodil debug protocol, but just a
>>>>> separation of concerns between display of information and the
>> behaviors of
>>>>> parse/unparse that need to be points where users can pause, and data
>>>>> structures available to display.
>>>>>>
>>>>>> E.g., it is 100% a display issue that the infoset (shown as XML) is
>>>>> clumsy, too big, etc.  The infoset is available in the processor
>> state, and
>>>>> one can examine the current node, enclosing node, prior sibling(s),
>>>>> following sibling(s), etc. One can elide contents that are too big for
>>>>> hexBinary, etc.
>>>>>>
>>>>>> I think this problem, how to display the infoset with sensible limits
>>>>> on sizing, is fairly easy to come up with some design for, that will at
>>>>> least be (1) always fairly small (2) much more useful in more cases. It
>>>>> won't be perfect but can be much better than what we do now.
>>>>>>
>>>>>> One sensible display "mode" should be that displaying the context
>>>>> surrounding the current element (when parsing or unparsing) displays at
>>>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L
>> characters
>>>>> (settable within reason ?)
>>>>>>
>>>>>> Sibling and enclosing nodes would be displayed eliding their contents
>>>>> to at most 1 line.
>>>>>>
>>>>>> Here's an example of what I mean. Displaying up to M=10 lines total:
>>>>>>
>>>>>> ...
>>>>>> <enclosingParent1>
>>>>>>    ...
>>>>>>    <priorSibling2>89ab782 ...</...>
>>>>>>    <priorSibling1>some text is here and some more text</...>
>>>>>>    <currentNode>value might be some big thing which needs to be
>> elided
>>>>> ...</...>
>>>>>>    <followingSibling1> ... </...>
>>>>>>    ???
>>>>>> </enclosingParent1>
>>>>>> ???
>>>>>>
>>>>>> The </...> is just an idea to reduce XML matching end-tag clutter.
>>>>>>
>>>>>> The ... on a line alone or where element content would appear
>>>>> generally means 1 or more other siblings. The way the display above
>> starts
>>>>> with ... means that this is a relative inner nest, not starting from
>> the
>>>>> absolute root.
>>>>>>
>>>>>> The ... within simple content means that content is elided to fit on
>>>>> one line. Always follows some text characters to differentiate from the
>>>>> child-element context.
>>>>>>
>>>>>> The ??? means zero or more other siblings.
>>>>>>
>>>>>> I used bold italic above to point out that the current node would be
>>>>> highlighted somehow. Probably a way to do this that doesn't require
>> display
>>>>> modes would be useful. E.g., a text marker like ">>>" as in:
>>>>>>
>>>>>>>>> <currentNode>value .... </...>
>>>>>>
>>>>>> might be better, particularly for a trace output being dumped to a
>>>>> text file.
>>>>>>
>>>>>> I made the above example an unparser kind of example by showing a
>>>>> following sibling that exists that is after the current node.
>>>>>>
>>>>>> I think the key concept is that any sibling node is displayed in a
>> way
>>>>> that fits on one line.
>>>>>> E.g., even if the element name was really long, I'd suggest:
>>>>>>
>>>>>>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>>>>>>
>>>>>> Where the element name itself gets elided because it is too long.
>>>>>>
>>>>>> A thought. Note that the above presentation is shown as quasi-XML,
>> but
>>>>> there's nothing XML-specific about it. A JSON-friendly equivalent
>> could be
>>>>> done as well:
>>>>>>
>>>>>> enclosingParent1 = {
>>>>>>    ...
>>>>>>    priorSibling2 = "89ab782..."
>>>>>>    priorSibling1 = "some text is here and some more text"
>>>>>>    currentNode = "value might be some big thing which needs to be
>>>>> elided ..."
>>>>>>    followingSibling1 = { ... }
>>>>>>    ???
>>>>>> }
>>>>>>
>>>>>> That's enough for 1 email thread on this debug topic.
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>> From: Steve Lawrence <sl...@apache.org>
>>>>>> Sent: Tuesday, January 5, 2021 2:26 PM
>>>>>> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>>>>> Subject: The future of the daffodil DFDL schema debugger?
>>>>>>
>>>>>>
>>>>>> Now that we're in a new year, I'd like to start a discussion about
>> the
>>>>>> Daffodil DFDL Schema debugger and how it might be improved to be more
>>>>>> useful.
>>>>>>
>>>>>> Note that this is not the capabilities to debug Daffodil itself in
>>>>>> something like Eclipse/IntelliJ, but the ability for Daffodil to
>>>>> provide
>>>>>> enough extra information during a parse/unparse so that a schema
>>>>>> developer can get an idea of what Daffodil is doing. This makes it
>>>>>> easier for users (rather than developers) to determine why a schema
>>>>>> isn't giving the expect parse/unparse result (either because of bad
>>>>> data
>>>>>> or a faulty schema.
>>>>>>
>>>>>> The current state of the debugger is enabled by providing the --debug
>>>>> or
>>>>>> --trace flags in the CLI. More information about that here:
>>>>>>
>>>>>> https://daffodil.apache.org/debugger/
>>>>>>
>>>>>> This enables a TUI and commands somewhat similar to GDB, providing
>>>>> thins
>>>>>> like breakpoints, steps, displaying the current infoset, display a
>> dump
>>>>>> of the data, etc.
>>>>>>
>>>>>> Although I find this tool pretty useful, it definitely has some
>> glaring
>>>>>> issues.
>>>>>>
>>>>>> The most glaring to me is that it really isn't useful at all for
>>>>>> debugging unparse. The data dumps only include then main
>> outputstream,
>>>>>> so determine things like suspensions and buffered output is
>> impossible.
>>>>>>
>>>>>> Another issue is the infoset output. When outputting the infoset, the
>>>>>> debugger currently just walks the entire thing and converts it to XML
>>>>>> and displays the XML. For large infosets, this is excess and can make
>>>>> it
>>>>>> impossible to use, even with some configurations the limit how much
>> of
>>>>>> that infoset is actually printed to the screen. Also things like
>> large
>>>>>> hex binary blobs create excessive and unusable output.
>>>>>>
>>>>>> Another thing I feel is missing is a schema view. Right now it's very
>>>>>> difficult to know where in the schema Daffodil actually is.
>>>>>>
>>>>>> I think these issues just need some thought improvement. One could
>>>>>> imagine a better way to stringify our unparse buffers for debug. One
>>>>>> could image a way to receive infoset state changes so the debugger
>> can
>>>>>> track things like backtracks and remove infosets. One could image a
>> way
>>>>>> display the schema
>>>>>>
>>>>>> We just need a better way to stringify the current state of the
>> unparse
>>>>>> data including buffers, and we need a way to for the debugger to
>>>>> receive
>>>>>> state change information about infoset so it can update displays
>> rather
>>>>>> than just constantly printing the entire infoset.
>>>>>>
>>>>>> However, I think another other big issue is just usability in
>> general.
>>>>> I
>>>>>> think the CLI usage is reasonable, but it's not always user friendly,
>>>>>> and is difficult to view multiple things at the same time. I think
>>>>>> because of this very few people even use this tool. So this this like
>>>>>> perhaps something worth focus.
>>>>>>
>>>>>> My first thought to improving this usability issue would be to
>>>>> implement
>>>>>> the Debug Adapter Protocol (DAP)
>>>>>> (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
>>>>>> which many IDE's implement. With this implemented, Daffodil could be
>>>>>> plugged in to any IDE that supports it and essentially get debugging
>>>>> for
>>>>>> free, without the need to worry about the GUI elements.
>>>>>>
>>>>>> I do have concerns that this just wouldn't have enough functionality
>>>>>> that we'd really need. For example, DAP really only has ability show
>>>>>> code (Daffodil's equivalent is the DFDL schema). There isn't a way to
>>>>>> show a live view of the infoset or data. Most DAP IDE's do have a
>>>>>> console output, so we could potentially make it so the console output
>>>>> is
>>>>>> a live view of infoset/data. But I'm not even sure most DAP friendly
>>>>>> IDE's could support this kindof console output. Does anyone have
>>>>>> familiarity with DAP IDE's or and what kinds of console capabilities
>>>>> are
>>>>>> available?
>>>>>>
>>>>>> I also looked into TUI libraries with the idea that we could just
>>>>> extend
>>>>>> our current debugger user interface to be a bit friendlier.
>>>>>> Unfortunately, there aren't too many Java/Scala TUI libraries and
>> those
>>>>>> that do exist don't have Apache friendly licenses. We also want to be
>>>>>> careful about increase dependencies just for a debugger than many
>>>>> people
>>>>>> might not use, so large graphics libraries are probably out of the
>>>>> question.
>>>>>>
>>>>>> This allo makes me wonder if an approach worth taking for the future
>> of
>>>>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
>>>>>> Protocol". I imagine it would be loosely based on DAP (which is
>>>>>> essentially JSON message based) but could be targeted to the things
>>>>> that
>>>>>> a DFDL schema debugger would really need. An added benefit with some
>>>>>> sort of protocol is the debugger interface can be uncoupled from
>>>>>> Daffodil itself, so we could implement a TUI/GUI/whatever in any
>>>>>> language/GUI framework and just have it communicate the protocol over
>>>>>> some form of IPC. Another benefit is that any future backends could
>>>>>> implement this protocol and so a single debugger could hook into
>>>>>> different backends without much issue. Unfortunately, defining such a
>>>>>> protocol might be a large task, but we do have our existing debug
>>>>>> infrastructure and things like DAP to guide its development/design.
>>>>>>
>>>>>> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
>> we
>>>>>> really just need the few improvements mentioned to the existing
>>>>>> debugger. Is that enough to make it usable? Or is an entirely
>> different
>>>>>> approach needed to debugging schemas?
>>>>>>
>>>>>
>>>>>
>>
> 


Re: The future of the daffodil DFDL schema debugger?

Posted by Adam Rosien <ad...@rosien.net>.
I've been reading up on DAP and wanted to share...

> There are many areas though that are unique to Daffodil that have no
representation in the spec.  These things (like InputStream, Infoset, PoU,
different variable types, backtracking, etc) will need an extension to
DAP.  This really boils down to defining these things to fit under the DAP
BaseProtocol and enabling handling of those objects on both the front and
back ends.

To me, much of the current state exposed by the (Daffodil) Debugger
translates directly to a DAP Variable[1]. DAP Variables can be
nested/hierarchical, so they could (potentially) model larger data like the
infoset. I can imagine shoving all the current state into Variables as a
proof-of-concept.

It also seems like the processing stack maintained by the Daffodil PState,
where each item references the relevant schema element, could translate to
the DAP StackFrame type [2]. That is, the path from the schema root to the
currently processing schema element becomes the "call stack". (Apologies if
I don't have all the Daffodil terms lined up correctly.)

For displaying the input data and processing progress, I looked at a few
existing VS Code extensions that provided non-builtin views, some of which
interact with their DAP debugger code [3] [4] [5] [6].

Finally, I took a cursory look at scala-debug-adapter [7], which, for
reference, wraps Microsoft's java-debug implementation of DAP. I was
curious about the set of request/response and event types. Additionally,
the Typescript API to VS Code offers custom DAP requests and responses, but
I couldn't find the equivalent notion in the java-debug project.

.. Adam

[1]
https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
[2]
https://microsoft.github.io/debug-adapter-protocol/specification#Types_StackFrame
[3] https://github.com/scalameta/metals-vscode (provides a debugger and
non-debugger custom UI)
[4] https://github.com/microsoft/vscode-cpptools (debugger + memory view)
[5] https://marketplace.visualstudio.com/items?itemName=marus25.cortex-debug
(debugger + memory view,
https://github.com/Marus/cortex-debug/blob/master/src/frontend/memory_content_provider.ts
)
[6]
https://marketplace.visualstudio.com/items?itemName=slevesque.vscode-hexdump
(extension for hexdumps that could be controlled by other extensions)
[7] https://github.com/scalacenter/scala-debug-adapter
[8] https://github.com/microsoft/java-debug

On Tue, Apr 20, 2021 at 7:08 AM John Wass <jw...@gmail.com> wrote:

> > Going to look deeper into how DAP might fit with Daffodil
>
> Have been looking over DAP and getting a good feeling about it. The
> specification [1] seems general enough that it could be applied to Daffodil
> and cover a swath of common operations (like start, stop, break, continue,
> code locations, variables, etc).
>
> There are many areas though that are unique to Daffodil that have no
> representation in the spec.  These things (like InputStream, Infoset, PoU,
> different variable types, backtracking, etc) will need an extension to
> DAP.  This really boils down to defining these things to fit under the DAP
> BaseProtocol and enabling handling of those objects on both the front and
> back ends.
>
> On the backend we need a Daffodil DAP protocol server.  Existing JVM
> implementations (like Java [2], Scala [3]) are tied closely to JDI and
> would bring a lot of extra baggage to work around that.  Developing a
> Daffodil specific implementation is no small task, but feasible.  There are
> a several existing implementations on the JVM that are close and can be
> looked at for reference.
>
> The backend implementation would look similar to what was described in an
> earlier post.  We could use ZIO/Akka/etc to implement the backend Protocol
> Server to enable the IO between the Daffodil process and the DAP clients.
> This implementation would now be guided by the DAP specification.
>
> With the protocol and backend extended to fit Daffodil that leaves the
> frontend.  In theory an existing IDE plugin should get pretty close to
> being able to perform the common debug operations mentioned above.  To
> support the Daffodil extensions there will need to be handling of the
> extended protocol into whatever views are desired/applicable.
>
> > Also looking into the Java Debug Interface (JDI) for comparison.
>
> JDI appears to be the wrong level of abstraction for what we are talking
> about in debugging Daffodil for schema development.  While DAP does do JVM
> debugging (through a JDI DAP impl) it also generalizes to many other
> debugging scenarios.  JDI on the other hand is very tied to the JVM.
>
> Extending the JDI appears to be more complex than dealing with DAP, and
> even though the JDI API is mostly defined with interfaces, there are choke
> points that limit to JVM concepts.  For example jdi.Value has a finite set
> of JVM types that it works with, its not clear where Daffodil types would
> plugin if even possible.
>
> The final note is that unique Daffodil features wouldn’t get to IDE support
> any faster JDI.  In some cases, like VS Code, you would still need an
> extended DAP to support these features.
>
> > and depending on how it shakes out will update the example to show
> integration
>
> It would appear wise to investigate DAP further.  Next step is to refine
> these thoughts with a prototype. I started an implementation in the example
> debugger project [4] to try to run the current example on a _minimal_ DAP
> implementation.
>
>
> [1] https://microsoft.github.io/debug-adapter-protocol/specification
> [2] https://github.com/Microsoft/java-debug
> [3] https://github.com/scalacenter/scala-debug-adapter
> [4] https://github.com/jw3/example-daffodil-debug
>
>
> On Mon, Apr 12, 2021 at 9:58 AM John Wass <jw...@gmail.com> wrote:
>
> > > the code is here https://github.com/jw3/example-daffodil-debug
> >
> > There is now a complete console based example for Zio that demonstrates
> > controlling the debug flow while distributing the current state to three
> > "displays".
> > 1. infoset at current step
> > 2. diff of infoset against previous step
> > 3. bit position and value of data.
> >
> > These displays are very rudimentary but demonstrate the ability to
> > asynchronously populate multiple views while synchronously controlling
> the
> > debug loop.
> >
> > > - The new protocol being informed by existing debugger and DAPis key
> >
> > Going to look deeper into how DAP might fit with Daffodil, and depending
> > on how it shakes out will update the example to show integration.
> >
> > Some interesting links to start with
> > - https://github.com/scalacenter/scala-debug-adapter
> > -
> >
> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
> > - https://github.com/microsoft/java-debug
> >
> > Also looking into the Java Debug Interface (JDI) for comparison.
> >
> >
> > On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:
> >
> >> Revisiting this post after doing some debugger related work and thinking
> >> about debug protocol/adapters to connect external tooling to the debug
> >> process.
> >>
> >> This comment is good
> >>
> >> > This allo makes me wonder if an approach worth taking for the future
> of
> >> Daffodil schema debugging is developing a sort of "Daffodil Debug
> >> Protocol". I imagine it would be loosely based on DAP (which is
> >> essentially JSON message based) but could be targeted to the things
> that a
> >> DFDL schema debugger would really need. An added benefit with some
> sort of
> >> protocol is the debugger interface can be uncoupled from Daffodil
> >> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
> >> framework and just have it communicate the protocol over some form of
> >> IPC. Another benefit is that any future backends could implement this
> >> protocol and so a single debugger could hook into different backends
> >> without much issue. Unfortunately, defining such a protocol might be a
> >> large task, but we do have our existing debug infrastructure and things
> >> like DAP to guide its development/design.
> >>
> >> Some thoughts on this
> >> - Defining the protocol will be a large task, but a minimal version
> >> should get up and round tripping quickly with a minimal subset of the
> >> protocol.
> >> - The new protocol being informed by existing debugger and DAPis key
> >> - Uncoupling from Daffodil is key
> >> - Adapt the Daffodil protocol to produce DAP after the fact so as not to
> >> constrain Daffodil debugging capability
> >> - We dont need to tie the protocol or adapters to a single framework,
> >> implementations of the IO layer should be simple enough to support
> multiple
> >> things (eg Akka, Zio, "basic" ...)
> >> - The current debugger lives in runtime1, but can we make an abstract
> API
> >> that any runtime would implement?
> >>
> >> Maybe a solution is structured like this
> >> - daffodil-debug-api:
> >>   - protocol model
> >>   - interfaces: debugger / IO adapter / etc
> >>   - lives in daffodil repo (new subproject?)
> >> - daffodil-debug-io-NAME
> >>   - provides implementation of a specific IO adapter
> >>   - multiple projects possible (daffodil-debugger-akka,
> >> daffodil-debugger-zio, etc)
> >>   - supported ones live in their own subprojects, but other can be
> >> plugged in from external sources
> >>   - ability to support multiple implementations reduces risk of lock-in
> >> - debugger applications
> >>   - maintained in external repositories
> >>   - depending on the IO implementation these could execute be in
> separate
> >> process or on separate machine
> >>   - like Steve said, could be any language / framework
> >>
> >> Three types of reference implementations / sample applications could
> also
> >> guide the development of the API
> >>   1. a replacement for the existing TUI debugger, expected to end up
> with
> >> at minimum the same functionality as the current one.
> >>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
> >>   3. an IDE integration
> >>
> >> Thoughts?
> >>
> >> Also I'm working on some reference implementations of these concepts
> >> using Akka and Zio.  Not quite ready to talk through it yet, but the
> code
> >> is here https://github.com/jw3/example-daffodil-debug
> >>
> >>
> >>
> >> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
> >> wrote:
> >>
> >>> Yep, something like that seems very reasonable for dealing with large
> >>> infosets. But it still feels like we still run into usability issues.
> >>> For example, what if a user wants to see more? We need some
> >>> configuration options to increase what we've ellided. It's not big, but
> >>> every new thing that needs configuration adds complexity and decreases
> >>> usability.
> >>>
> >>> And I think the only reason we are trying to spend effort elliding
> >>> things is because we're limited to this gdb-like interface where you
> can
> >>> only print out a little information at a time.
> >>>
> >>> I think what would really is to dump this gdb interface and instead use
> >>> multiple windows/views. As a really close example to what I imagine, I
> >>> recently came across this hex editor:
> >>>
> >>> https://www.synalysis.net/
> >>>
> >>> The screenshots are a bit small so it's not super clear, but this tool
> >>> has one view for the data in hex, and one view for a tree of parsed
> >>> results (which is very similar to our infoset). The "infoset" view has
> >>> information like offset/length/value, and can be related back to the
> >>> data view to find the actual bits.
> >>>
> >>> I imagine the "next generation daffodil debugger" to look much like
> >>> this. As data is parsed, the infoset view fills up. This view could act
> >>> like a standard GUI tree so you could collapse sections or scroll
> around
> >>> to show just the parts you care about, and have search capabilities to
> >>> quickly jump around. The advantage here is you no longer really need
> >>> automated eliding or heuristics for what the user *might* care about.
> >>> You just show the whole thing and let user scroll around. As daffodil
> >>> parses and backtracks, this tree grows or shrinks.
> >>>
> >>> I also imagine you could have a cursor moving around the hex view, so
> as
> >>> daffodil moves around (e.g. scanning for delimiters, extracting
> >>> integers), one could update this data view to show what daffodil is
> >>> doing and where it is.
> >>>
> >>> I also image there could be other views as well. For example, a schema
> >>> view to show where in the schema daffodil is, and to add/remove
> >>> breakpoints. And an information view for things like variables,
> in-scope
> >>> delimiters, PoU's, etc.
> >>>
> >>> The only reason I mention a debug protcol is that would allow this GUI
> >>> to be more easily written in something other that Java/Scala to take
> >>> advantage of other GUI toolkits. It's been a long while since I've done
> >>> anything with Java guis, but they seems pretty poor that last I looked
> >>> at them. Would even allow for a TUI, which Java has little/no support
> >>> for. Also enables things like remote deubgging if an socket IPC was
> >>> used. Though I'm not sure all of that is necessary. Just thinking what
> >>> would be ideal, and it can always be pared back.
> >>>
> >>>
> >>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> >>> > I don't think of it as a daffodil debug protocol, but just a
> >>> separation of concerns between display of information and the
> behaviors of
> >>> parse/unparse that need to be points where users can pause, and data
> >>> structures available to display.
> >>> >
> >>> > E.g., it is 100% a display issue that the infoset (shown as XML) is
> >>> clumsy, too big, etc.  The infoset is available in the processor
> state, and
> >>> one can examine the current node, enclosing node, prior sibling(s),
> >>> following sibling(s), etc. One can elide contents that are too big for
> >>> hexBinary, etc.
> >>> >
> >>> > I think this problem, how to display the infoset with sensible limits
> >>> on sizing, is fairly easy to come up with some design for, that will at
> >>> least be (1) always fairly small (2) much more useful in more cases. It
> >>> won't be perfect but can be much better than what we do now.
> >>> >
> >>> > One sensible display "mode" should be that displaying the context
> >>> surrounding the current element (when parsing or unparsing) displays at
> >>> most N-lines. (N/2 before, N/2 after) with a maximum length of L
> characters
> >>> (settable within reason ?)
> >>> >
> >>> > Sibling and enclosing nodes would be displayed eliding their contents
> >>> to at most 1 line.
> >>> >
> >>> > Here's an example of what I mean. Displaying up to M=10 lines total:
> >>> >
> >>> > ...
> >>> > <enclosingParent1>
> >>> >    ...
> >>> >    <priorSibling2>89ab782 ...</...>
> >>> >    <priorSibling1>some text is here and some more text</...>
> >>> >    <currentNode>value might be some big thing which needs to be
> elided
> >>> ...</...>
> >>> >    <followingSibling1> ... </...>
> >>> >    ???
> >>> > </enclosingParent1>
> >>> > ???
> >>> >
> >>> > The </...> is just an idea to reduce XML matching end-tag clutter.
> >>> >
> >>> > The ... on a line alone or where element content would appear
> >>> generally means 1 or more other siblings. The way the display above
> starts
> >>> with ... means that this is a relative inner nest, not starting from
> the
> >>> absolute root.
> >>> >
> >>> > The ... within simple content means that content is elided to fit on
> >>> one line. Always follows some text characters to differentiate from the
> >>> child-element context.
> >>> >
> >>> > The ??? means zero or more other siblings.
> >>> >
> >>> > I used bold italic above to point out that the current node would be
> >>> highlighted somehow. Probably a way to do this that doesn't require
> display
> >>> modes would be useful. E.g., a text marker like ">>>" as in:
> >>> >
> >>> >>>> <currentNode>value .... </...>
> >>> >
> >>> > might be better, particularly for a trace output being dumped to a
> >>> text file.
> >>> >
> >>> > I made the above example an unparser kind of example by showing a
> >>> following sibling that exists that is after the current node.
> >>> >
> >>> > I think the key concept is that any sibling node is displayed in a
> way
> >>> that fits on one line.
> >>> > E.g., even if the element name was really long, I'd suggest:
> >>> >
> >>> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> >>> >
> >>> > Where the element name itself gets elided because it is too long.
> >>> >
> >>> > A thought. Note that the above presentation is shown as quasi-XML,
> but
> >>> there's nothing XML-specific about it. A JSON-friendly equivalent
> could be
> >>> done as well:
> >>> >
> >>> > enclosingParent1 = {
> >>> >    ...
> >>> >    priorSibling2 = "89ab782..."
> >>> >    priorSibling1 = "some text is here and some more text"
> >>> >    currentNode = "value might be some big thing which needs to be
> >>> elided ..."
> >>> >    followingSibling1 = { ... }
> >>> >    ???
> >>> > }
> >>> >
> >>> > That's enough for 1 email thread on this debug topic.
> >>> >
> >>> >
> >>> > ________________________________
> >>> > From: Steve Lawrence <sl...@apache.org>
> >>> > Sent: Tuesday, January 5, 2021 2:26 PM
> >>> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> >>> > Subject: The future of the daffodil DFDL schema debugger?
> >>> >
> >>> >
> >>> > Now that we're in a new year, I'd like to start a discussion about
> the
> >>> > Daffodil DFDL Schema debugger and how it might be improved to be more
> >>> > useful.
> >>> >
> >>> > Note that this is not the capabilities to debug Daffodil itself in
> >>> > something like Eclipse/IntelliJ, but the ability for Daffodil to
> >>> provide
> >>> > enough extra information during a parse/unparse so that a schema
> >>> > developer can get an idea of what Daffodil is doing. This makes it
> >>> > easier for users (rather than developers) to determine why a schema
> >>> > isn't giving the expect parse/unparse result (either because of bad
> >>> data
> >>> > or a faulty schema.
> >>> >
> >>> > The current state of the debugger is enabled by providing the --debug
> >>> or
> >>> > --trace flags in the CLI. More information about that here:
> >>> >
> >>> > https://daffodil.apache.org/debugger/
> >>> >
> >>> > This enables a TUI and commands somewhat similar to GDB, providing
> >>> thins
> >>> > like breakpoints, steps, displaying the current infoset, display a
> dump
> >>> > of the data, etc.
> >>> >
> >>> > Although I find this tool pretty useful, it definitely has some
> glaring
> >>> > issues.
> >>> >
> >>> > The most glaring to me is that it really isn't useful at all for
> >>> > debugging unparse. The data dumps only include then main
> outputstream,
> >>> > so determine things like suspensions and buffered output is
> impossible.
> >>> >
> >>> > Another issue is the infoset output. When outputting the infoset, the
> >>> > debugger currently just walks the entire thing and converts it to XML
> >>> > and displays the XML. For large infosets, this is excess and can make
> >>> it
> >>> > impossible to use, even with some configurations the limit how much
> of
> >>> > that infoset is actually printed to the screen. Also things like
> large
> >>> > hex binary blobs create excessive and unusable output.
> >>> >
> >>> > Another thing I feel is missing is a schema view. Right now it's very
> >>> > difficult to know where in the schema Daffodil actually is.
> >>> >
> >>> > I think these issues just need some thought improvement. One could
> >>> > imagine a better way to stringify our unparse buffers for debug. One
> >>> > could image a way to receive infoset state changes so the debugger
> can
> >>> > track things like backtracks and remove infosets. One could image a
> way
> >>> > display the schema
> >>> >
> >>> > We just need a better way to stringify the current state of the
> unparse
> >>> > data including buffers, and we need a way to for the debugger to
> >>> receive
> >>> > state change information about infoset so it can update displays
> rather
> >>> > than just constantly printing the entire infoset.
> >>> >
> >>> > However, I think another other big issue is just usability in
> general.
> >>> I
> >>> > think the CLI usage is reasonable, but it's not always user friendly,
> >>> > and is difficult to view multiple things at the same time. I think
> >>> > because of this very few people even use this tool. So this this like
> >>> > perhaps something worth focus.
> >>> >
> >>> > My first thought to improving this usability issue would be to
> >>> implement
> >>> > the Debug Adapter Protocol (DAP)
> >>> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
> >>> > which many IDE's implement. With this implemented, Daffodil could be
> >>> > plugged in to any IDE that supports it and essentially get debugging
> >>> for
> >>> > free, without the need to worry about the GUI elements.
> >>> >
> >>> > I do have concerns that this just wouldn't have enough functionality
> >>> > that we'd really need. For example, DAP really only has ability show
> >>> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
> >>> > show a live view of the infoset or data. Most DAP IDE's do have a
> >>> > console output, so we could potentially make it so the console output
> >>> is
> >>> > a live view of infoset/data. But I'm not even sure most DAP friendly
> >>> > IDE's could support this kindof console output. Does anyone have
> >>> > familiarity with DAP IDE's or and what kinds of console capabilities
> >>> are
> >>> > available?
> >>> >
> >>> > I also looked into TUI libraries with the idea that we could just
> >>> extend
> >>> > our current debugger user interface to be a bit friendlier.
> >>> > Unfortunately, there aren't too many Java/Scala TUI libraries and
> those
> >>> > that do exist don't have Apache friendly licenses. We also want to be
> >>> > careful about increase dependencies just for a debugger than many
> >>> people
> >>> > might not use, so large graphics libraries are probably out of the
> >>> question.
> >>> >
> >>> > This allo makes me wonder if an approach worth taking for the future
> of
> >>> > Daffodil schema debugging is developing a sort of "Daffodil Debug
> >>> > Protocol". I imagine it would be loosely based on DAP (which is
> >>> > essentially JSON message based) but could be targeted to the things
> >>> that
> >>> > a DFDL schema debugger would really need. An added benefit with some
> >>> > sort of protocol is the debugger interface can be uncoupled from
> >>> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
> >>> > language/GUI framework and just have it communicate the protocol over
> >>> > some form of IPC. Another benefit is that any future backends could
> >>> > implement this protocol and so a single debugger could hook into
> >>> > different backends without much issue. Unfortunately, defining such a
> >>> > protocol might be a large task, but we do have our existing debug
> >>> > infrastructure and things like DAP to guide its development/design.
> >>> >
> >>> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> we
> >>> > really just need the few improvements mentioned to the existing
> >>> > debugger. Is that enough to make it usable? Or is an entirely
> different
> >>> > approach needed to debugging schemas?
> >>> >
> >>>
> >>>
>

Re: The future of the daffodil DFDL schema debugger?

Posted by John Wass <jw...@gmail.com>.
> Going to look deeper into how DAP might fit with Daffodil

Have been looking over DAP and getting a good feeling about it. The
specification [1] seems general enough that it could be applied to Daffodil
and cover a swath of common operations (like start, stop, break, continue,
code locations, variables, etc).

There are many areas though that are unique to Daffodil that have no
representation in the spec.  These things (like InputStream, Infoset, PoU,
different variable types, backtracking, etc) will need an extension to
DAP.  This really boils down to defining these things to fit under the DAP
BaseProtocol and enabling handling of those objects on both the front and
back ends.

On the backend we need a Daffodil DAP protocol server.  Existing JVM
implementations (like Java [2], Scala [3]) are tied closely to JDI and
would bring a lot of extra baggage to work around that.  Developing a
Daffodil specific implementation is no small task, but feasible.  There are
a several existing implementations on the JVM that are close and can be
looked at for reference.

The backend implementation would look similar to what was described in an
earlier post.  We could use ZIO/Akka/etc to implement the backend Protocol
Server to enable the IO between the Daffodil process and the DAP clients.
This implementation would now be guided by the DAP specification.

With the protocol and backend extended to fit Daffodil that leaves the
frontend.  In theory an existing IDE plugin should get pretty close to
being able to perform the common debug operations mentioned above.  To
support the Daffodil extensions there will need to be handling of the
extended protocol into whatever views are desired/applicable.

> Also looking into the Java Debug Interface (JDI) for comparison.

JDI appears to be the wrong level of abstraction for what we are talking
about in debugging Daffodil for schema development.  While DAP does do JVM
debugging (through a JDI DAP impl) it also generalizes to many other
debugging scenarios.  JDI on the other hand is very tied to the JVM.

Extending the JDI appears to be more complex than dealing with DAP, and
even though the JDI API is mostly defined with interfaces, there are choke
points that limit to JVM concepts.  For example jdi.Value has a finite set
of JVM types that it works with, its not clear where Daffodil types would
plugin if even possible.

The final note is that unique Daffodil features wouldn’t get to IDE support
any faster JDI.  In some cases, like VS Code, you would still need an
extended DAP to support these features.

> and depending on how it shakes out will update the example to show
integration

It would appear wise to investigate DAP further.  Next step is to refine
these thoughts with a prototype. I started an implementation in the example
debugger project [4] to try to run the current example on a _minimal_ DAP
implementation.


[1] https://microsoft.github.io/debug-adapter-protocol/specification
[2] https://github.com/Microsoft/java-debug
[3] https://github.com/scalacenter/scala-debug-adapter
[4] https://github.com/jw3/example-daffodil-debug


On Mon, Apr 12, 2021 at 9:58 AM John Wass <jw...@gmail.com> wrote:

> > the code is here https://github.com/jw3/example-daffodil-debug
>
> There is now a complete console based example for Zio that demonstrates
> controlling the debug flow while distributing the current state to three
> "displays".
> 1. infoset at current step
> 2. diff of infoset against previous step
> 3. bit position and value of data.
>
> These displays are very rudimentary but demonstrate the ability to
> asynchronously populate multiple views while synchronously controlling the
> debug loop.
>
> > - The new protocol being informed by existing debugger and DAPis key
>
> Going to look deeper into how DAP might fit with Daffodil, and depending
> on how it shakes out will update the example to show integration.
>
> Some interesting links to start with
> - https://github.com/scalacenter/scala-debug-adapter
> -
> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
> - https://github.com/microsoft/java-debug
>
> Also looking into the Java Debug Interface (JDI) for comparison.
>
>
> On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:
>
>> Revisiting this post after doing some debugger related work and thinking
>> about debug protocol/adapters to connect external tooling to the debug
>> process.
>>
>> This comment is good
>>
>> > This allo makes me wonder if an approach worth taking for the future of
>> Daffodil schema debugging is developing a sort of "Daffodil Debug
>> Protocol". I imagine it would be loosely based on DAP (which is
>> essentially JSON message based) but could be targeted to the things that a
>> DFDL schema debugger would really need. An added benefit with some  sort of
>> protocol is the debugger interface can be uncoupled from Daffodil
>> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
>> framework and just have it communicate the protocol over some form of
>> IPC. Another benefit is that any future backends could implement this
>> protocol and so a single debugger could hook into different backends
>> without much issue. Unfortunately, defining such a protocol might be a
>> large task, but we do have our existing debug infrastructure and things
>> like DAP to guide its development/design.
>>
>> Some thoughts on this
>> - Defining the protocol will be a large task, but a minimal version
>> should get up and round tripping quickly with a minimal subset of the
>> protocol.
>> - The new protocol being informed by existing debugger and DAPis key
>> - Uncoupling from Daffodil is key
>> - Adapt the Daffodil protocol to produce DAP after the fact so as not to
>> constrain Daffodil debugging capability
>> - We dont need to tie the protocol or adapters to a single framework,
>> implementations of the IO layer should be simple enough to support multiple
>> things (eg Akka, Zio, "basic" ...)
>> - The current debugger lives in runtime1, but can we make an abstract API
>> that any runtime would implement?
>>
>> Maybe a solution is structured like this
>> - daffodil-debug-api:
>>   - protocol model
>>   - interfaces: debugger / IO adapter / etc
>>   - lives in daffodil repo (new subproject?)
>> - daffodil-debug-io-NAME
>>   - provides implementation of a specific IO adapter
>>   - multiple projects possible (daffodil-debugger-akka,
>> daffodil-debugger-zio, etc)
>>   - supported ones live in their own subprojects, but other can be
>> plugged in from external sources
>>   - ability to support multiple implementations reduces risk of lock-in
>> - debugger applications
>>   - maintained in external repositories
>>   - depending on the IO implementation these could execute be in separate
>> process or on separate machine
>>   - like Steve said, could be any language / framework
>>
>> Three types of reference implementations / sample applications could also
>> guide the development of the API
>>   1. a replacement for the existing TUI debugger, expected to end up with
>> at minimum the same functionality as the current one.
>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
>>   3. an IDE integration
>>
>> Thoughts?
>>
>> Also I'm working on some reference implementations of these concepts
>> using Akka and Zio.  Not quite ready to talk through it yet, but the code
>> is here https://github.com/jw3/example-daffodil-debug
>>
>>
>>
>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
>> wrote:
>>
>>> Yep, something like that seems very reasonable for dealing with large
>>> infosets. But it still feels like we still run into usability issues.
>>> For example, what if a user wants to see more? We need some
>>> configuration options to increase what we've ellided. It's not big, but
>>> every new thing that needs configuration adds complexity and decreases
>>> usability.
>>>
>>> And I think the only reason we are trying to spend effort elliding
>>> things is because we're limited to this gdb-like interface where you can
>>> only print out a little information at a time.
>>>
>>> I think what would really is to dump this gdb interface and instead use
>>> multiple windows/views. As a really close example to what I imagine, I
>>> recently came across this hex editor:
>>>
>>> https://www.synalysis.net/
>>>
>>> The screenshots are a bit small so it's not super clear, but this tool
>>> has one view for the data in hex, and one view for a tree of parsed
>>> results (which is very similar to our infoset). The "infoset" view has
>>> information like offset/length/value, and can be related back to the
>>> data view to find the actual bits.
>>>
>>> I imagine the "next generation daffodil debugger" to look much like
>>> this. As data is parsed, the infoset view fills up. This view could act
>>> like a standard GUI tree so you could collapse sections or scroll around
>>> to show just the parts you care about, and have search capabilities to
>>> quickly jump around. The advantage here is you no longer really need
>>> automated eliding or heuristics for what the user *might* care about.
>>> You just show the whole thing and let user scroll around. As daffodil
>>> parses and backtracks, this tree grows or shrinks.
>>>
>>> I also imagine you could have a cursor moving around the hex view, so as
>>> daffodil moves around (e.g. scanning for delimiters, extracting
>>> integers), one could update this data view to show what daffodil is
>>> doing and where it is.
>>>
>>> I also image there could be other views as well. For example, a schema
>>> view to show where in the schema daffodil is, and to add/remove
>>> breakpoints. And an information view for things like variables, in-scope
>>> delimiters, PoU's, etc.
>>>
>>> The only reason I mention a debug protcol is that would allow this GUI
>>> to be more easily written in something other that Java/Scala to take
>>> advantage of other GUI toolkits. It's been a long while since I've done
>>> anything with Java guis, but they seems pretty poor that last I looked
>>> at them. Would even allow for a TUI, which Java has little/no support
>>> for. Also enables things like remote deubgging if an socket IPC was
>>> used. Though I'm not sure all of that is necessary. Just thinking what
>>> would be ideal, and it can always be pared back.
>>>
>>>
>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
>>> > I don't think of it as a daffodil debug protocol, but just a
>>> separation of concerns between display of information and the behaviors of
>>> parse/unparse that need to be points where users can pause, and data
>>> structures available to display.
>>> >
>>> > E.g., it is 100% a display issue that the infoset (shown as XML) is
>>> clumsy, too big, etc.  The infoset is available in the processor state, and
>>> one can examine the current node, enclosing node, prior sibling(s),
>>> following sibling(s), etc. One can elide contents that are too big for
>>> hexBinary, etc.
>>> >
>>> > I think this problem, how to display the infoset with sensible limits
>>> on sizing, is fairly easy to come up with some design for, that will at
>>> least be (1) always fairly small (2) much more useful in more cases. It
>>> won't be perfect but can be much better than what we do now.
>>> >
>>> > One sensible display "mode" should be that displaying the context
>>> surrounding the current element (when parsing or unparsing) displays at
>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L characters
>>> (settable within reason ?)
>>> >
>>> > Sibling and enclosing nodes would be displayed eliding their contents
>>> to at most 1 line.
>>> >
>>> > Here's an example of what I mean. Displaying up to M=10 lines total:
>>> >
>>> > ...
>>> > <enclosingParent1>
>>> >    ...
>>> >    <priorSibling2>89ab782 ...</...>
>>> >    <priorSibling1>some text is here and some more text</...>
>>> >    <currentNode>value might be some big thing which needs to be elided
>>> ...</...>
>>> >    <followingSibling1> ... </...>
>>> >    ???
>>> > </enclosingParent1>
>>> > ???
>>> >
>>> > The </...> is just an idea to reduce XML matching end-tag clutter.
>>> >
>>> > The ... on a line alone or where element content would appear
>>> generally means 1 or more other siblings. The way the display above starts
>>> with ... means that this is a relative inner nest, not starting from the
>>> absolute root.
>>> >
>>> > The ... within simple content means that content is elided to fit on
>>> one line. Always follows some text characters to differentiate from the
>>> child-element context.
>>> >
>>> > The ??? means zero or more other siblings.
>>> >
>>> > I used bold italic above to point out that the current node would be
>>> highlighted somehow. Probably a way to do this that doesn't require display
>>> modes would be useful. E.g., a text marker like ">>>" as in:
>>> >
>>> >>>> <currentNode>value .... </...>
>>> >
>>> > might be better, particularly for a trace output being dumped to a
>>> text file.
>>> >
>>> > I made the above example an unparser kind of example by showing a
>>> following sibling that exists that is after the current node.
>>> >
>>> > I think the key concept is that any sibling node is displayed in a way
>>> that fits on one line.
>>> > E.g., even if the element name was really long, I'd suggest:
>>> >
>>> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>>> >
>>> > Where the element name itself gets elided because it is too long.
>>> >
>>> > A thought. Note that the above presentation is shown as quasi-XML, but
>>> there's nothing XML-specific about it. A JSON-friendly equivalent could be
>>> done as well:
>>> >
>>> > enclosingParent1 = {
>>> >    ...
>>> >    priorSibling2 = "89ab782..."
>>> >    priorSibling1 = "some text is here and some more text"
>>> >    currentNode = "value might be some big thing which needs to be
>>> elided ..."
>>> >    followingSibling1 = { ... }
>>> >    ???
>>> > }
>>> >
>>> > That's enough for 1 email thread on this debug topic.
>>> >
>>> >
>>> > ________________________________
>>> > From: Steve Lawrence <sl...@apache.org>
>>> > Sent: Tuesday, January 5, 2021 2:26 PM
>>> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>>> > Subject: The future of the daffodil DFDL schema debugger?
>>> >
>>> >
>>> > Now that we're in a new year, I'd like to start a discussion about the
>>> > Daffodil DFDL Schema debugger and how it might be improved to be more
>>> > useful.
>>> >
>>> > Note that this is not the capabilities to debug Daffodil itself in
>>> > something like Eclipse/IntelliJ, but the ability for Daffodil to
>>> provide
>>> > enough extra information during a parse/unparse so that a schema
>>> > developer can get an idea of what Daffodil is doing. This makes it
>>> > easier for users (rather than developers) to determine why a schema
>>> > isn't giving the expect parse/unparse result (either because of bad
>>> data
>>> > or a faulty schema.
>>> >
>>> > The current state of the debugger is enabled by providing the --debug
>>> or
>>> > --trace flags in the CLI. More information about that here:
>>> >
>>> > https://daffodil.apache.org/debugger/
>>> >
>>> > This enables a TUI and commands somewhat similar to GDB, providing
>>> thins
>>> > like breakpoints, steps, displaying the current infoset, display a dump
>>> > of the data, etc.
>>> >
>>> > Although I find this tool pretty useful, it definitely has some glaring
>>> > issues.
>>> >
>>> > The most glaring to me is that it really isn't useful at all for
>>> > debugging unparse. The data dumps only include then main outputstream,
>>> > so determine things like suspensions and buffered output is impossible.
>>> >
>>> > Another issue is the infoset output. When outputting the infoset, the
>>> > debugger currently just walks the entire thing and converts it to XML
>>> > and displays the XML. For large infosets, this is excess and can make
>>> it
>>> > impossible to use, even with some configurations the limit how much of
>>> > that infoset is actually printed to the screen. Also things like large
>>> > hex binary blobs create excessive and unusable output.
>>> >
>>> > Another thing I feel is missing is a schema view. Right now it's very
>>> > difficult to know where in the schema Daffodil actually is.
>>> >
>>> > I think these issues just need some thought improvement. One could
>>> > imagine a better way to stringify our unparse buffers for debug. One
>>> > could image a way to receive infoset state changes so the debugger can
>>> > track things like backtracks and remove infosets. One could image a way
>>> > display the schema
>>> >
>>> > We just need a better way to stringify the current state of the unparse
>>> > data including buffers, and we need a way to for the debugger to
>>> receive
>>> > state change information about infoset so it can update displays rather
>>> > than just constantly printing the entire infoset.
>>> >
>>> > However, I think another other big issue is just usability in general.
>>> I
>>> > think the CLI usage is reasonable, but it's not always user friendly,
>>> > and is difficult to view multiple things at the same time. I think
>>> > because of this very few people even use this tool. So this this like
>>> > perhaps something worth focus.
>>> >
>>> > My first thought to improving this usability issue would be to
>>> implement
>>> > the Debug Adapter Protocol (DAP)
>>> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
>>> > which many IDE's implement. With this implemented, Daffodil could be
>>> > plugged in to any IDE that supports it and essentially get debugging
>>> for
>>> > free, without the need to worry about the GUI elements.
>>> >
>>> > I do have concerns that this just wouldn't have enough functionality
>>> > that we'd really need. For example, DAP really only has ability show
>>> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
>>> > show a live view of the infoset or data. Most DAP IDE's do have a
>>> > console output, so we could potentially make it so the console output
>>> is
>>> > a live view of infoset/data. But I'm not even sure most DAP friendly
>>> > IDE's could support this kindof console output. Does anyone have
>>> > familiarity with DAP IDE's or and what kinds of console capabilities
>>> are
>>> > available?
>>> >
>>> > I also looked into TUI libraries with the idea that we could just
>>> extend
>>> > our current debugger user interface to be a bit friendlier.
>>> > Unfortunately, there aren't too many Java/Scala TUI libraries and those
>>> > that do exist don't have Apache friendly licenses. We also want to be
>>> > careful about increase dependencies just for a debugger than many
>>> people
>>> > might not use, so large graphics libraries are probably out of the
>>> question.
>>> >
>>> > This allo makes me wonder if an approach worth taking for the future of
>>> > Daffodil schema debugging is developing a sort of "Daffodil Debug
>>> > Protocol". I imagine it would be loosely based on DAP (which is
>>> > essentially JSON message based) but could be targeted to the things
>>> that
>>> > a DFDL schema debugger would really need. An added benefit with some
>>> > sort of protocol is the debugger interface can be uncoupled from
>>> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
>>> > language/GUI framework and just have it communicate the protocol over
>>> > some form of IPC. Another benefit is that any future backends could
>>> > implement this protocol and so a single debugger could hook into
>>> > different backends without much issue. Unfortunately, defining such a
>>> > protocol might be a large task, but we do have our existing debug
>>> > infrastructure and things like DAP to guide its development/design.
>>> >
>>> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps we
>>> > really just need the few improvements mentioned to the existing
>>> > debugger. Is that enough to make it usable? Or is an entirely different
>>> > approach needed to debugging schemas?
>>> >
>>>
>>>

Re: The future of the daffodil DFDL schema debugger?

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
Welcome Adam,

Here's the link to Adam's book, which looks very useful.

(Not shameless self promotion if someone else sends the link 🙂)

https://essentialeffects.dev/

-mikeb


________________________________
From: Adam Rosien <ad...@rosien.net>
Sent: Monday, April 19, 2021 11:21 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: The future of the daffodil DFDL schema debugger?

Hi everybody, I've recently started working on Daffodil with some other
folks and will be helping where I can with the debugger.

I've been writing Scala since ~2011 and recently wrote a book about Cats
Effect, which has a similar scope to ZIO (effects, concurrency, etc.). If
anybody has any questions about the approach and techniques, I'm happy to
help.

.. Adam



Re: The future of the daffodil DFDL schema debugger?

Posted by Adam Rosien <ad...@rosien.net>.
Hi everybody, I've recently started working on Daffodil with some other
folks and will be helping where I can with the debugger.

I've been writing Scala since ~2011 and recently wrote a book about Cats
Effect, which has a similar scope to ZIO (effects, concurrency, etc.). If
anybody has any questions about the approach and techniques, I'm happy to
help.

.. Adam

On Fri, Apr 16, 2021 at 2:49 PM Beckerle, Mike <
mbeckerle@owlcyberdefense.com> wrote:

> This is actually very cool using ZIO for this. I have to learn more about
> ZIO.
>
>
> ________________________________
> From: John Wass <jw...@gmail.com>
> Sent: Monday, April 12, 2021 9:58 AM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> > the code is here https://github.com/jw3/example-daffodil-debug
>
> There is now a complete console based example for Zio that demonstrates
> controlling the debug flow while distributing the current state to three
> "displays".
> 1. infoset at current step
> 2. diff of infoset against previous step
> 3. bit position and value of data.
>
> These displays are very rudimentary but demonstrate the ability to
> asynchronously populate multiple views while synchronously controlling the
> debug loop.
>
> > - The new protocol being informed by existing debugger and DAPis key
>
> Going to look deeper into how DAP might fit with Daffodil, and depending on
> how it shakes out will update the example to show integration.
>
> Some interesting links to start with
> - https://github.com/scalacenter/scala-debug-adapter
> -
> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
> - https://github.com/microsoft/java-debug
>
> Also looking into the Java Debug Interface (JDI) for comparison.
>
>
> On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:
>
> > Revisiting this post after doing some debugger related work and thinking
> > about debug protocol/adapters to connect external tooling to the debug
> > process.
> >
> > This comment is good
> >
> > > This allo makes me wonder if an approach worth taking for the future of
> > Daffodil schema debugging is developing a sort of "Daffodil Debug
> Protocol".
> > I imagine it would be loosely based on DAP (which is  essentially JSON
> > message based) but could be targeted to the things that a DFDL schema
> > debugger would really need. An added benefit with some  sort of protocol
> > is the debugger interface can be uncoupled from Daffodil itself, so we
> > could implement a TUI/GUI/whatever in any  language/GUI framework and
> just
> > have it communicate the protocol over some form of IPC. Another benefit
> > is that any future backends could implement this protocol and so a single
> > debugger could hook into different backends without much issue.
> > Unfortunately, defining such a protocol might be a large task, but we do
> > have our existing debug infrastructure and things like DAP to guide its
> > development/design.
> >
> > Some thoughts on this
> > - Defining the protocol will be a large task, but a minimal version
> should
> > get up and round tripping quickly with a minimal subset of the protocol.
> > - The new protocol being informed by existing debugger and DAPis key
> > - Uncoupling from Daffodil is key
> > - Adapt the Daffodil protocol to produce DAP after the fact so as not to
> > constrain Daffodil debugging capability
> > - We dont need to tie the protocol or adapters to a single framework,
> > implementations of the IO layer should be simple enough to support
> multiple
> > things (eg Akka, Zio, "basic" ...)
> > - The current debugger lives in runtime1, but can we make an abstract API
> > that any runtime would implement?
> >
> > Maybe a solution is structured like this
> > - daffodil-debug-api:
> >   - protocol model
> >   - interfaces: debugger / IO adapter / etc
> >   - lives in daffodil repo (new subproject?)
> > - daffodil-debug-io-NAME
> >   - provides implementation of a specific IO adapter
> >   - multiple projects possible (daffodil-debugger-akka,
> > daffodil-debugger-zio, etc)
> >   - supported ones live in their own subprojects, but other can be
> plugged
> > in from external sources
> >   - ability to support multiple implementations reduces risk of lock-in
> > - debugger applications
> >   - maintained in external repositories
> >   - depending on the IO implementation these could execute be in separate
> > process or on separate machine
> >   - like Steve said, could be any language / framework
> >
> > Three types of reference implementations / sample applications could also
> > guide the development of the API
> >   1. a replacement for the existing TUI debugger, expected to end up with
> > at minimum the same functionality as the current one.
> >   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
> >   3. an IDE integration
> >
> > Thoughts?
> >
> > Also I'm working on some reference implementations of these concepts
> using
> > Akka and Zio.  Not quite ready to talk through it yet, but the code is
> here
> > https://github.com/jw3/example-daffodil-debug
> >
> >
> >
> > On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
> > wrote:
> >
> >> Yep, something like that seems very reasonable for dealing with large
> >> infosets. But it still feels like we still run into usability issues.
> >> For example, what if a user wants to see more? We need some
> >> configuration options to increase what we've ellided. It's not big, but
> >> every new thing that needs configuration adds complexity and decreases
> >> usability.
> >>
> >> And I think the only reason we are trying to spend effort elliding
> >> things is because we're limited to this gdb-like interface where you can
> >> only print out a little information at a time.
> >>
> >> I think what would really is to dump this gdb interface and instead use
> >> multiple windows/views. As a really close example to what I imagine, I
> >> recently came across this hex editor:
> >>
> >> https://www.synalysis.net/
> >>
> >> The screenshots are a bit small so it's not super clear, but this tool
> >> has one view for the data in hex, and one view for a tree of parsed
> >> results (which is very similar to our infoset). The "infoset" view has
> >> information like offset/length/value, and can be related back to the
> >> data view to find the actual bits.
> >>
> >> I imagine the "next generation daffodil debugger" to look much like
> >> this. As data is parsed, the infoset view fills up. This view could act
> >> like a standard GUI tree so you could collapse sections or scroll around
> >> to show just the parts you care about, and have search capabilities to
> >> quickly jump around. The advantage here is you no longer really need
> >> automated eliding or heuristics for what the user *might* care about.
> >> You just show the whole thing and let user scroll around. As daffodil
> >> parses and backtracks, this tree grows or shrinks.
> >>
> >> I also imagine you could have a cursor moving around the hex view, so as
> >> daffodil moves around (e.g. scanning for delimiters, extracting
> >> integers), one could update this data view to show what daffodil is
> >> doing and where it is.
> >>
> >> I also image there could be other views as well. For example, a schema
> >> view to show where in the schema daffodil is, and to add/remove
> >> breakpoints. And an information view for things like variables, in-scope
> >> delimiters, PoU's, etc.
> >>
> >> The only reason I mention a debug protcol is that would allow this GUI
> >> to be more easily written in something other that Java/Scala to take
> >> advantage of other GUI toolkits. It's been a long while since I've done
> >> anything with Java guis, but they seems pretty poor that last I looked
> >> at them. Would even allow for a TUI, which Java has little/no support
> >> for. Also enables things like remote deubgging if an socket IPC was
> >> used. Though I'm not sure all of that is necessary. Just thinking what
> >> would be ideal, and it can always be pared back.
> >>
> >>
> >> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> >> > I don't think of it as a daffodil debug protocol, but just a
> separation
> >> of concerns between display of information and the behaviors of
> >> parse/unparse that need to be points where users can pause, and data
> >> structures available to display.
> >> >
> >> > E.g., it is 100% a display issue that the infoset (shown as XML) is
> >> clumsy, too big, etc.  The infoset is available in the processor state,
> and
> >> one can examine the current node, enclosing node, prior sibling(s),
> >> following sibling(s), etc. One can elide contents that are too big for
> >> hexBinary, etc.
> >> >
> >> > I think this problem, how to display the infoset with sensible limits
> >> on sizing, is fairly easy to come up with some design for, that will at
> >> least be (1) always fairly small (2) much more useful in more cases. It
> >> won't be perfect but can be much better than what we do now.
> >> >
> >> > One sensible display "mode" should be that displaying the context
> >> surrounding the current element (when parsing or unparsing) displays at
> >> most N-lines. (N/2 before, N/2 after) with a maximum length of L
> characters
> >> (settable within reason ?)
> >> >
> >> > Sibling and enclosing nodes would be displayed eliding their contents
> >> to at most 1 line.
> >> >
> >> > Here's an example of what I mean. Displaying up to M=10 lines total:
> >> >
> >> > ...
> >> > <enclosingParent1>
> >> >    ...
> >> >    <priorSibling2>89ab782 ...</...>
> >> >    <priorSibling1>some text is here and some more text</...>
> >> >    <currentNode>value might be some big thing which needs to be elided
> >> ...</...>
> >> >    <followingSibling1> ... </...>
> >> >    ???
> >> > </enclosingParent1>
> >> > ???
> >> >
> >> > The </...> is just an idea to reduce XML matching end-tag clutter.
> >> >
> >> > The ... on a line alone or where element content would appear
> generally
> >> means 1 or more other siblings. The way the display above starts with
> ...
> >> means that this is a relative inner nest, not starting from the absolute
> >> root.
> >> >
> >> > The ... within simple content means that content is elided to fit on
> >> one line. Always follows some text characters to differentiate from the
> >> child-element context.
> >> >
> >> > The ??? means zero or more other siblings.
> >> >
> >> > I used bold italic above to point out that the current node would be
> >> highlighted somehow. Probably a way to do this that doesn't require
> display
> >> modes would be useful. E.g., a text marker like ">>>" as in:
> >> >
> >> >>>> <currentNode>value .... </...>
> >> >
> >> > might be better, particularly for a trace output being dumped to a
> text
> >> file.
> >> >
> >> > I made the above example an unparser kind of example by showing a
> >> following sibling that exists that is after the current node.
> >> >
> >> > I think the key concept is that any sibling node is displayed in a way
> >> that fits on one line.
> >> > E.g., even if the element name was really long, I'd suggest:
> >> >
> >> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> >> >
> >> > Where the element name itself gets elided because it is too long.
> >> >
> >> > A thought. Note that the above presentation is shown as quasi-XML, but
> >> there's nothing XML-specific about it. A JSON-friendly equivalent could
> be
> >> done as well:
> >> >
> >> > enclosingParent1 = {
> >> >    ...
> >> >    priorSibling2 = "89ab782..."
> >> >    priorSibling1 = "some text is here and some more text"
> >> >    currentNode = "value might be some big thing which needs to be
> >> elided ..."
> >> >    followingSibling1 = { ... }
> >> >    ???
> >> > }
> >> >
> >> > That's enough for 1 email thread on this debug topic.
> >> >
> >> >
> >> > ________________________________
> >> > From: Steve Lawrence <sl...@apache.org>
> >> > Sent: Tuesday, January 5, 2021 2:26 PM
> >> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> >> > Subject: The future of the daffodil DFDL schema debugger?
> >> >
> >> >
> >> > Now that we're in a new year, I'd like to start a discussion about the
> >> > Daffodil DFDL Schema debugger and how it might be improved to be more
> >> > useful.
> >> >
> >> > Note that this is not the capabilities to debug Daffodil itself in
> >> > something like Eclipse/IntelliJ, but the ability for Daffodil to
> provide
> >> > enough extra information during a parse/unparse so that a schema
> >> > developer can get an idea of what Daffodil is doing. This makes it
> >> > easier for users (rather than developers) to determine why a schema
> >> > isn't giving the expect parse/unparse result (either because of bad
> data
> >> > or a faulty schema.
> >> >
> >> > The current state of the debugger is enabled by providing the --debug
> or
> >> > --trace flags in the CLI. More information about that here:
> >> >
> >> > https://daffodil.apache.org/debugger/
> >> >
> >> > This enables a TUI and commands somewhat similar to GDB, providing
> thins
> >> > like breakpoints, steps, displaying the current infoset, display a
> dump
> >> > of the data, etc.
> >> >
> >> > Although I find this tool pretty useful, it definitely has some
> glaring
> >> > issues.
> >> >
> >> > The most glaring to me is that it really isn't useful at all for
> >> > debugging unparse. The data dumps only include then main outputstream,
> >> > so determine things like suspensions and buffered output is
> impossible.
> >> >
> >> > Another issue is the infoset output. When outputting the infoset, the
> >> > debugger currently just walks the entire thing and converts it to XML
> >> > and displays the XML. For large infosets, this is excess and can make
> it
> >> > impossible to use, even with some configurations the limit how much of
> >> > that infoset is actually printed to the screen. Also things like large
> >> > hex binary blobs create excessive and unusable output.
> >> >
> >> > Another thing I feel is missing is a schema view. Right now it's very
> >> > difficult to know where in the schema Daffodil actually is.
> >> >
> >> > I think these issues just need some thought improvement. One could
> >> > imagine a better way to stringify our unparse buffers for debug. One
> >> > could image a way to receive infoset state changes so the debugger can
> >> > track things like backtracks and remove infosets. One could image a
> way
> >> > display the schema
> >> >
> >> > We just need a better way to stringify the current state of the
> unparse
> >> > data including buffers, and we need a way to for the debugger to
> receive
> >> > state change information about infoset so it can update displays
> rather
> >> > than just constantly printing the entire infoset.
> >> >
> >> > However, I think another other big issue is just usability in
> general. I
> >> > think the CLI usage is reasonable, but it's not always user friendly,
> >> > and is difficult to view multiple things at the same time. I think
> >> > because of this very few people even use this tool. So this this like
> >> > perhaps something worth focus.
> >> >
> >> > My first thought to improving this usability issue would be to
> implement
> >> > the Debug Adapter Protocol (DAP)
> >> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
> >> > which many IDE's implement. With this implemented, Daffodil could be
> >> > plugged in to any IDE that supports it and essentially get debugging
> for
> >> > free, without the need to worry about the GUI elements.
> >> >
> >> > I do have concerns that this just wouldn't have enough functionality
> >> > that we'd really need. For example, DAP really only has ability show
> >> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
> >> > show a live view of the infoset or data. Most DAP IDE's do have a
> >> > console output, so we could potentially make it so the console output
> is
> >> > a live view of infoset/data. But I'm not even sure most DAP friendly
> >> > IDE's could support this kindof console output. Does anyone have
> >> > familiarity with DAP IDE's or and what kinds of console capabilities
> are
> >> > available?
> >> >
> >> > I also looked into TUI libraries with the idea that we could just
> extend
> >> > our current debugger user interface to be a bit friendlier.
> >> > Unfortunately, there aren't too many Java/Scala TUI libraries and
> those
> >> > that do exist don't have Apache friendly licenses. We also want to be
> >> > careful about increase dependencies just for a debugger than many
> people
> >> > might not use, so large graphics libraries are probably out of the
> >> question.
> >> >
> >> > This allo makes me wonder if an approach worth taking for the future
> of
> >> > Daffodil schema debugging is developing a sort of "Daffodil Debug
> >> > Protocol". I imagine it would be loosely based on DAP (which is
> >> > essentially JSON message based) but could be targeted to the things
> that
> >> > a DFDL schema debugger would really need. An added benefit with some
> >> > sort of protocol is the debugger interface can be uncoupled from
> >> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
> >> > language/GUI framework and just have it communicate the protocol over
> >> > some form of IPC. Another benefit is that any future backends could
> >> > implement this protocol and so a single debugger could hook into
> >> > different backends without much issue. Unfortunately, defining such a
> >> > protocol might be a large task, but we do have our existing debug
> >> > infrastructure and things like DAP to guide its development/design.
> >> >
> >> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> we
> >> > really just need the few improvements mentioned to the existing
> >> > debugger. Is that enough to make it usable? Or is an entirely
> different
> >> > approach needed to debugging schemas?
> >> >
> >>
> >>
>

Re: The future of the daffodil DFDL schema debugger?

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
This is actually very cool using ZIO for this. I have to learn more about ZIO.


________________________________
From: John Wass <jw...@gmail.com>
Sent: Monday, April 12, 2021 9:58 AM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: Re: The future of the daffodil DFDL schema debugger?

> the code is here https://github.com/jw3/example-daffodil-debug

There is now a complete console based example for Zio that demonstrates
controlling the debug flow while distributing the current state to three
"displays".
1. infoset at current step
2. diff of infoset against previous step
3. bit position and value of data.

These displays are very rudimentary but demonstrate the ability to
asynchronously populate multiple views while synchronously controlling the
debug loop.

> - The new protocol being informed by existing debugger and DAPis key

Going to look deeper into how DAP might fit with Daffodil, and depending on
how it shakes out will update the example to show integration.

Some interesting links to start with
- https://github.com/scalacenter/scala-debug-adapter
- https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
- https://github.com/microsoft/java-debug

Also looking into the Java Debug Interface (JDI) for comparison.


On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:

> Revisiting this post after doing some debugger related work and thinking
> about debug protocol/adapters to connect external tooling to the debug
> process.
>
> This comment is good
>
> > This allo makes me wonder if an approach worth taking for the future of
> Daffodil schema debugging is developing a sort of "Daffodil Debug Protocol".
> I imagine it would be loosely based on DAP (which is  essentially JSON
> message based) but could be targeted to the things that a DFDL schema
> debugger would really need. An added benefit with some  sort of protocol
> is the debugger interface can be uncoupled from Daffodil itself, so we
> could implement a TUI/GUI/whatever in any  language/GUI framework and just
> have it communicate the protocol over some form of IPC. Another benefit
> is that any future backends could implement this protocol and so a single
> debugger could hook into different backends without much issue.
> Unfortunately, defining such a protocol might be a large task, but we do
> have our existing debug infrastructure and things like DAP to guide its
> development/design.
>
> Some thoughts on this
> - Defining the protocol will be a large task, but a minimal version should
> get up and round tripping quickly with a minimal subset of the protocol.
> - The new protocol being informed by existing debugger and DAPis key
> - Uncoupling from Daffodil is key
> - Adapt the Daffodil protocol to produce DAP after the fact so as not to
> constrain Daffodil debugging capability
> - We dont need to tie the protocol or adapters to a single framework,
> implementations of the IO layer should be simple enough to support multiple
> things (eg Akka, Zio, "basic" ...)
> - The current debugger lives in runtime1, but can we make an abstract API
> that any runtime would implement?
>
> Maybe a solution is structured like this
> - daffodil-debug-api:
>   - protocol model
>   - interfaces: debugger / IO adapter / etc
>   - lives in daffodil repo (new subproject?)
> - daffodil-debug-io-NAME
>   - provides implementation of a specific IO adapter
>   - multiple projects possible (daffodil-debugger-akka,
> daffodil-debugger-zio, etc)
>   - supported ones live in their own subprojects, but other can be plugged
> in from external sources
>   - ability to support multiple implementations reduces risk of lock-in
> - debugger applications
>   - maintained in external repositories
>   - depending on the IO implementation these could execute be in separate
> process or on separate machine
>   - like Steve said, could be any language / framework
>
> Three types of reference implementations / sample applications could also
> guide the development of the API
>   1. a replacement for the existing TUI debugger, expected to end up with
> at minimum the same functionality as the current one.
>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
>   3. an IDE integration
>
> Thoughts?
>
> Also I'm working on some reference implementations of these concepts using
> Akka and Zio.  Not quite ready to talk through it yet, but the code is here
> https://github.com/jw3/example-daffodil-debug
>
>
>
> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
> wrote:
>
>> Yep, something like that seems very reasonable for dealing with large
>> infosets. But it still feels like we still run into usability issues.
>> For example, what if a user wants to see more? We need some
>> configuration options to increase what we've ellided. It's not big, but
>> every new thing that needs configuration adds complexity and decreases
>> usability.
>>
>> And I think the only reason we are trying to spend effort elliding
>> things is because we're limited to this gdb-like interface where you can
>> only print out a little information at a time.
>>
>> I think what would really is to dump this gdb interface and instead use
>> multiple windows/views. As a really close example to what I imagine, I
>> recently came across this hex editor:
>>
>> https://www.synalysis.net/
>>
>> The screenshots are a bit small so it's not super clear, but this tool
>> has one view for the data in hex, and one view for a tree of parsed
>> results (which is very similar to our infoset). The "infoset" view has
>> information like offset/length/value, and can be related back to the
>> data view to find the actual bits.
>>
>> I imagine the "next generation daffodil debugger" to look much like
>> this. As data is parsed, the infoset view fills up. This view could act
>> like a standard GUI tree so you could collapse sections or scroll around
>> to show just the parts you care about, and have search capabilities to
>> quickly jump around. The advantage here is you no longer really need
>> automated eliding or heuristics for what the user *might* care about.
>> You just show the whole thing and let user scroll around. As daffodil
>> parses and backtracks, this tree grows or shrinks.
>>
>> I also imagine you could have a cursor moving around the hex view, so as
>> daffodil moves around (e.g. scanning for delimiters, extracting
>> integers), one could update this data view to show what daffodil is
>> doing and where it is.
>>
>> I also image there could be other views as well. For example, a schema
>> view to show where in the schema daffodil is, and to add/remove
>> breakpoints. And an information view for things like variables, in-scope
>> delimiters, PoU's, etc.
>>
>> The only reason I mention a debug protcol is that would allow this GUI
>> to be more easily written in something other that Java/Scala to take
>> advantage of other GUI toolkits. It's been a long while since I've done
>> anything with Java guis, but they seems pretty poor that last I looked
>> at them. Would even allow for a TUI, which Java has little/no support
>> for. Also enables things like remote deubgging if an socket IPC was
>> used. Though I'm not sure all of that is necessary. Just thinking what
>> would be ideal, and it can always be pared back.
>>
>>
>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
>> > I don't think of it as a daffodil debug protocol, but just a separation
>> of concerns between display of information and the behaviors of
>> parse/unparse that need to be points where users can pause, and data
>> structures available to display.
>> >
>> > E.g., it is 100% a display issue that the infoset (shown as XML) is
>> clumsy, too big, etc.  The infoset is available in the processor state, and
>> one can examine the current node, enclosing node, prior sibling(s),
>> following sibling(s), etc. One can elide contents that are too big for
>> hexBinary, etc.
>> >
>> > I think this problem, how to display the infoset with sensible limits
>> on sizing, is fairly easy to come up with some design for, that will at
>> least be (1) always fairly small (2) much more useful in more cases. It
>> won't be perfect but can be much better than what we do now.
>> >
>> > One sensible display "mode" should be that displaying the context
>> surrounding the current element (when parsing or unparsing) displays at
>> most N-lines. (N/2 before, N/2 after) with a maximum length of L characters
>> (settable within reason ?)
>> >
>> > Sibling and enclosing nodes would be displayed eliding their contents
>> to at most 1 line.
>> >
>> > Here's an example of what I mean. Displaying up to M=10 lines total:
>> >
>> > ...
>> > <enclosingParent1>
>> >    ...
>> >    <priorSibling2>89ab782 ...</...>
>> >    <priorSibling1>some text is here and some more text</...>
>> >    <currentNode>value might be some big thing which needs to be elided
>> ...</...>
>> >    <followingSibling1> ... </...>
>> >    ???
>> > </enclosingParent1>
>> > ???
>> >
>> > The </...> is just an idea to reduce XML matching end-tag clutter.
>> >
>> > The ... on a line alone or where element content would appear generally
>> means 1 or more other siblings. The way the display above starts with ...
>> means that this is a relative inner nest, not starting from the absolute
>> root.
>> >
>> > The ... within simple content means that content is elided to fit on
>> one line. Always follows some text characters to differentiate from the
>> child-element context.
>> >
>> > The ??? means zero or more other siblings.
>> >
>> > I used bold italic above to point out that the current node would be
>> highlighted somehow. Probably a way to do this that doesn't require display
>> modes would be useful. E.g., a text marker like ">>>" as in:
>> >
>> >>>> <currentNode>value .... </...>
>> >
>> > might be better, particularly for a trace output being dumped to a text
>> file.
>> >
>> > I made the above example an unparser kind of example by showing a
>> following sibling that exists that is after the current node.
>> >
>> > I think the key concept is that any sibling node is displayed in a way
>> that fits on one line.
>> > E.g., even if the element name was really long, I'd suggest:
>> >
>> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>> >
>> > Where the element name itself gets elided because it is too long.
>> >
>> > A thought. Note that the above presentation is shown as quasi-XML, but
>> there's nothing XML-specific about it. A JSON-friendly equivalent could be
>> done as well:
>> >
>> > enclosingParent1 = {
>> >    ...
>> >    priorSibling2 = "89ab782..."
>> >    priorSibling1 = "some text is here and some more text"
>> >    currentNode = "value might be some big thing which needs to be
>> elided ..."
>> >    followingSibling1 = { ... }
>> >    ???
>> > }
>> >
>> > That's enough for 1 email thread on this debug topic.
>> >
>> >
>> > ________________________________
>> > From: Steve Lawrence <sl...@apache.org>
>> > Sent: Tuesday, January 5, 2021 2:26 PM
>> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> > Subject: The future of the daffodil DFDL schema debugger?
>> >
>> >
>> > Now that we're in a new year, I'd like to start a discussion about the
>> > Daffodil DFDL Schema debugger and how it might be improved to be more
>> > useful.
>> >
>> > Note that this is not the capabilities to debug Daffodil itself in
>> > something like Eclipse/IntelliJ, but the ability for Daffodil to provide
>> > enough extra information during a parse/unparse so that a schema
>> > developer can get an idea of what Daffodil is doing. This makes it
>> > easier for users (rather than developers) to determine why a schema
>> > isn't giving the expect parse/unparse result (either because of bad data
>> > or a faulty schema.
>> >
>> > The current state of the debugger is enabled by providing the --debug or
>> > --trace flags in the CLI. More information about that here:
>> >
>> > https://daffodil.apache.org/debugger/
>> >
>> > This enables a TUI and commands somewhat similar to GDB, providing thins
>> > like breakpoints, steps, displaying the current infoset, display a dump
>> > of the data, etc.
>> >
>> > Although I find this tool pretty useful, it definitely has some glaring
>> > issues.
>> >
>> > The most glaring to me is that it really isn't useful at all for
>> > debugging unparse. The data dumps only include then main outputstream,
>> > so determine things like suspensions and buffered output is impossible.
>> >
>> > Another issue is the infoset output. When outputting the infoset, the
>> > debugger currently just walks the entire thing and converts it to XML
>> > and displays the XML. For large infosets, this is excess and can make it
>> > impossible to use, even with some configurations the limit how much of
>> > that infoset is actually printed to the screen. Also things like large
>> > hex binary blobs create excessive and unusable output.
>> >
>> > Another thing I feel is missing is a schema view. Right now it's very
>> > difficult to know where in the schema Daffodil actually is.
>> >
>> > I think these issues just need some thought improvement. One could
>> > imagine a better way to stringify our unparse buffers for debug. One
>> > could image a way to receive infoset state changes so the debugger can
>> > track things like backtracks and remove infosets. One could image a way
>> > display the schema
>> >
>> > We just need a better way to stringify the current state of the unparse
>> > data including buffers, and we need a way to for the debugger to receive
>> > state change information about infoset so it can update displays rather
>> > than just constantly printing the entire infoset.
>> >
>> > However, I think another other big issue is just usability in general. I
>> > think the CLI usage is reasonable, but it's not always user friendly,
>> > and is difficult to view multiple things at the same time. I think
>> > because of this very few people even use this tool. So this this like
>> > perhaps something worth focus.
>> >
>> > My first thought to improving this usability issue would be to implement
>> > the Debug Adapter Protocol (DAP)
>> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
>> > which many IDE's implement. With this implemented, Daffodil could be
>> > plugged in to any IDE that supports it and essentially get debugging for
>> > free, without the need to worry about the GUI elements.
>> >
>> > I do have concerns that this just wouldn't have enough functionality
>> > that we'd really need. For example, DAP really only has ability show
>> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
>> > show a live view of the infoset or data. Most DAP IDE's do have a
>> > console output, so we could potentially make it so the console output is
>> > a live view of infoset/data. But I'm not even sure most DAP friendly
>> > IDE's could support this kindof console output. Does anyone have
>> > familiarity with DAP IDE's or and what kinds of console capabilities are
>> > available?
>> >
>> > I also looked into TUI libraries with the idea that we could just extend
>> > our current debugger user interface to be a bit friendlier.
>> > Unfortunately, there aren't too many Java/Scala TUI libraries and those
>> > that do exist don't have Apache friendly licenses. We also want to be
>> > careful about increase dependencies just for a debugger than many people
>> > might not use, so large graphics libraries are probably out of the
>> question.
>> >
>> > This allo makes me wonder if an approach worth taking for the future of
>> > Daffodil schema debugging is developing a sort of "Daffodil Debug
>> > Protocol". I imagine it would be loosely based on DAP (which is
>> > essentially JSON message based) but could be targeted to the things that
>> > a DFDL schema debugger would really need. An added benefit with some
>> > sort of protocol is the debugger interface can be uncoupled from
>> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
>> > language/GUI framework and just have it communicate the protocol over
>> > some form of IPC. Another benefit is that any future backends could
>> > implement this protocol and so a single debugger could hook into
>> > different backends without much issue. Unfortunately, defining such a
>> > protocol might be a large task, but we do have our existing debug
>> > infrastructure and things like DAP to guide its development/design.
>> >
>> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps we
>> > really just need the few improvements mentioned to the existing
>> > debugger. Is that enough to make it usable? Or is an entirely different
>> > approach needed to debugging schemas?
>> >
>>
>>

Re: The future of the daffodil DFDL schema debugger?

Posted by John Wass <jw...@gmail.com>.
> the code is here https://github.com/jw3/example-daffodil-debug

There is now a complete console based example for Zio that demonstrates
controlling the debug flow while distributing the current state to three
"displays".
1. infoset at current step
2. diff of infoset against previous step
3. bit position and value of data.

These displays are very rudimentary but demonstrate the ability to
asynchronously populate multiple views while synchronously controlling the
debug loop.

> - The new protocol being informed by existing debugger and DAPis key

Going to look deeper into how DAP might fit with Daffodil, and depending on
how it shakes out will update the example to show integration.

Some interesting links to start with
- https://github.com/scalacenter/scala-debug-adapter
- https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
- https://github.com/microsoft/java-debug

Also looking into the Java Debug Interface (JDI) for comparison.


On Thu, Apr 8, 2021 at 12:36 PM John Wass <jw...@gmail.com> wrote:

> Revisiting this post after doing some debugger related work and thinking
> about debug protocol/adapters to connect external tooling to the debug
> process.
>
> This comment is good
>
> > This allo makes me wonder if an approach worth taking for the future of
> Daffodil schema debugging is developing a sort of "Daffodil Debug Protocol".
> I imagine it would be loosely based on DAP (which is  essentially JSON
> message based) but could be targeted to the things that a DFDL schema
> debugger would really need. An added benefit with some  sort of protocol
> is the debugger interface can be uncoupled from Daffodil itself, so we
> could implement a TUI/GUI/whatever in any  language/GUI framework and just
> have it communicate the protocol over some form of IPC. Another benefit
> is that any future backends could implement this protocol and so a single
> debugger could hook into different backends without much issue.
> Unfortunately, defining such a protocol might be a large task, but we do
> have our existing debug infrastructure and things like DAP to guide its
> development/design.
>
> Some thoughts on this
> - Defining the protocol will be a large task, but a minimal version should
> get up and round tripping quickly with a minimal subset of the protocol.
> - The new protocol being informed by existing debugger and DAPis key
> - Uncoupling from Daffodil is key
> - Adapt the Daffodil protocol to produce DAP after the fact so as not to
> constrain Daffodil debugging capability
> - We dont need to tie the protocol or adapters to a single framework,
> implementations of the IO layer should be simple enough to support multiple
> things (eg Akka, Zio, "basic" ...)
> - The current debugger lives in runtime1, but can we make an abstract API
> that any runtime would implement?
>
> Maybe a solution is structured like this
> - daffodil-debug-api:
>   - protocol model
>   - interfaces: debugger / IO adapter / etc
>   - lives in daffodil repo (new subproject?)
> - daffodil-debug-io-NAME
>   - provides implementation of a specific IO adapter
>   - multiple projects possible (daffodil-debugger-akka,
> daffodil-debugger-zio, etc)
>   - supported ones live in their own subprojects, but other can be plugged
> in from external sources
>   - ability to support multiple implementations reduces risk of lock-in
> - debugger applications
>   - maintained in external repositories
>   - depending on the IO implementation these could execute be in separate
> process or on separate machine
>   - like Steve said, could be any language / framework
>
> Three types of reference implementations / sample applications could also
> guide the development of the API
>   1. a replacement for the existing TUI debugger, expected to end up with
> at minimum the same functionality as the current one.
>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
>   3. an IDE integration
>
> Thoughts?
>
> Also I'm working on some reference implementations of these concepts using
> Akka and Zio.  Not quite ready to talk through it yet, but the code is here
> https://github.com/jw3/example-daffodil-debug
>
>
>
> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org>
> wrote:
>
>> Yep, something like that seems very reasonable for dealing with large
>> infosets. But it still feels like we still run into usability issues.
>> For example, what if a user wants to see more? We need some
>> configuration options to increase what we've ellided. It's not big, but
>> every new thing that needs configuration adds complexity and decreases
>> usability.
>>
>> And I think the only reason we are trying to spend effort elliding
>> things is because we're limited to this gdb-like interface where you can
>> only print out a little information at a time.
>>
>> I think what would really is to dump this gdb interface and instead use
>> multiple windows/views. As a really close example to what I imagine, I
>> recently came across this hex editor:
>>
>> https://www.synalysis.net/
>>
>> The screenshots are a bit small so it's not super clear, but this tool
>> has one view for the data in hex, and one view for a tree of parsed
>> results (which is very similar to our infoset). The "infoset" view has
>> information like offset/length/value, and can be related back to the
>> data view to find the actual bits.
>>
>> I imagine the "next generation daffodil debugger" to look much like
>> this. As data is parsed, the infoset view fills up. This view could act
>> like a standard GUI tree so you could collapse sections or scroll around
>> to show just the parts you care about, and have search capabilities to
>> quickly jump around. The advantage here is you no longer really need
>> automated eliding or heuristics for what the user *might* care about.
>> You just show the whole thing and let user scroll around. As daffodil
>> parses and backtracks, this tree grows or shrinks.
>>
>> I also imagine you could have a cursor moving around the hex view, so as
>> daffodil moves around (e.g. scanning for delimiters, extracting
>> integers), one could update this data view to show what daffodil is
>> doing and where it is.
>>
>> I also image there could be other views as well. For example, a schema
>> view to show where in the schema daffodil is, and to add/remove
>> breakpoints. And an information view for things like variables, in-scope
>> delimiters, PoU's, etc.
>>
>> The only reason I mention a debug protcol is that would allow this GUI
>> to be more easily written in something other that Java/Scala to take
>> advantage of other GUI toolkits. It's been a long while since I've done
>> anything with Java guis, but they seems pretty poor that last I looked
>> at them. Would even allow for a TUI, which Java has little/no support
>> for. Also enables things like remote deubgging if an socket IPC was
>> used. Though I'm not sure all of that is necessary. Just thinking what
>> would be ideal, and it can always be pared back.
>>
>>
>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
>> > I don't think of it as a daffodil debug protocol, but just a separation
>> of concerns between display of information and the behaviors of
>> parse/unparse that need to be points where users can pause, and data
>> structures available to display.
>> >
>> > E.g., it is 100% a display issue that the infoset (shown as XML) is
>> clumsy, too big, etc.  The infoset is available in the processor state, and
>> one can examine the current node, enclosing node, prior sibling(s),
>> following sibling(s), etc. One can elide contents that are too big for
>> hexBinary, etc.
>> >
>> > I think this problem, how to display the infoset with sensible limits
>> on sizing, is fairly easy to come up with some design for, that will at
>> least be (1) always fairly small (2) much more useful in more cases. It
>> won't be perfect but can be much better than what we do now.
>> >
>> > One sensible display "mode" should be that displaying the context
>> surrounding the current element (when parsing or unparsing) displays at
>> most N-lines. (N/2 before, N/2 after) with a maximum length of L characters
>> (settable within reason ?)
>> >
>> > Sibling and enclosing nodes would be displayed eliding their contents
>> to at most 1 line.
>> >
>> > Here's an example of what I mean. Displaying up to M=10 lines total:
>> >
>> > ...
>> > <enclosingParent1>
>> >    ...
>> >    <priorSibling2>89ab782 ...</...>
>> >    <priorSibling1>some text is here and some more text</...>
>> >    <currentNode>value might be some big thing which needs to be elided
>> ...</...>
>> >    <followingSibling1> ... </...>
>> >    ???
>> > </enclosingParent1>
>> > ???
>> >
>> > The </...> is just an idea to reduce XML matching end-tag clutter.
>> >
>> > The ... on a line alone or where element content would appear generally
>> means 1 or more other siblings. The way the display above starts with ...
>> means that this is a relative inner nest, not starting from the absolute
>> root.
>> >
>> > The ... within simple content means that content is elided to fit on
>> one line. Always follows some text characters to differentiate from the
>> child-element context.
>> >
>> > The ??? means zero or more other siblings.
>> >
>> > I used bold italic above to point out that the current node would be
>> highlighted somehow. Probably a way to do this that doesn't require display
>> modes would be useful. E.g., a text marker like ">>>" as in:
>> >
>> >>>> <currentNode>value .... </...>
>> >
>> > might be better, particularly for a trace output being dumped to a text
>> file.
>> >
>> > I made the above example an unparser kind of example by showing a
>> following sibling that exists that is after the current node.
>> >
>> > I think the key concept is that any sibling node is displayed in a way
>> that fits on one line.
>> > E.g., even if the element name was really long, I'd suggest:
>> >
>> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>> >
>> > Where the element name itself gets elided because it is too long.
>> >
>> > A thought. Note that the above presentation is shown as quasi-XML, but
>> there's nothing XML-specific about it. A JSON-friendly equivalent could be
>> done as well:
>> >
>> > enclosingParent1 = {
>> >    ...
>> >    priorSibling2 = "89ab782..."
>> >    priorSibling1 = "some text is here and some more text"
>> >    currentNode = "value might be some big thing which needs to be
>> elided ..."
>> >    followingSibling1 = { ... }
>> >    ???
>> > }
>> >
>> > That's enough for 1 email thread on this debug topic.
>> >
>> >
>> > ________________________________
>> > From: Steve Lawrence <sl...@apache.org>
>> > Sent: Tuesday, January 5, 2021 2:26 PM
>> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
>> > Subject: The future of the daffodil DFDL schema debugger?
>> >
>> >
>> > Now that we're in a new year, I'd like to start a discussion about the
>> > Daffodil DFDL Schema debugger and how it might be improved to be more
>> > useful.
>> >
>> > Note that this is not the capabilities to debug Daffodil itself in
>> > something like Eclipse/IntelliJ, but the ability for Daffodil to provide
>> > enough extra information during a parse/unparse so that a schema
>> > developer can get an idea of what Daffodil is doing. This makes it
>> > easier for users (rather than developers) to determine why a schema
>> > isn't giving the expect parse/unparse result (either because of bad data
>> > or a faulty schema.
>> >
>> > The current state of the debugger is enabled by providing the --debug or
>> > --trace flags in the CLI. More information about that here:
>> >
>> > https://daffodil.apache.org/debugger/
>> >
>> > This enables a TUI and commands somewhat similar to GDB, providing thins
>> > like breakpoints, steps, displaying the current infoset, display a dump
>> > of the data, etc.
>> >
>> > Although I find this tool pretty useful, it definitely has some glaring
>> > issues.
>> >
>> > The most glaring to me is that it really isn't useful at all for
>> > debugging unparse. The data dumps only include then main outputstream,
>> > so determine things like suspensions and buffered output is impossible.
>> >
>> > Another issue is the infoset output. When outputting the infoset, the
>> > debugger currently just walks the entire thing and converts it to XML
>> > and displays the XML. For large infosets, this is excess and can make it
>> > impossible to use, even with some configurations the limit how much of
>> > that infoset is actually printed to the screen. Also things like large
>> > hex binary blobs create excessive and unusable output.
>> >
>> > Another thing I feel is missing is a schema view. Right now it's very
>> > difficult to know where in the schema Daffodil actually is.
>> >
>> > I think these issues just need some thought improvement. One could
>> > imagine a better way to stringify our unparse buffers for debug. One
>> > could image a way to receive infoset state changes so the debugger can
>> > track things like backtracks and remove infosets. One could image a way
>> > display the schema
>> >
>> > We just need a better way to stringify the current state of the unparse
>> > data including buffers, and we need a way to for the debugger to receive
>> > state change information about infoset so it can update displays rather
>> > than just constantly printing the entire infoset.
>> >
>> > However, I think another other big issue is just usability in general. I
>> > think the CLI usage is reasonable, but it's not always user friendly,
>> > and is difficult to view multiple things at the same time. I think
>> > because of this very few people even use this tool. So this this like
>> > perhaps something worth focus.
>> >
>> > My first thought to improving this usability issue would be to implement
>> > the Debug Adapter Protocol (DAP)
>> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
>> > which many IDE's implement. With this implemented, Daffodil could be
>> > plugged in to any IDE that supports it and essentially get debugging for
>> > free, without the need to worry about the GUI elements.
>> >
>> > I do have concerns that this just wouldn't have enough functionality
>> > that we'd really need. For example, DAP really only has ability show
>> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
>> > show a live view of the infoset or data. Most DAP IDE's do have a
>> > console output, so we could potentially make it so the console output is
>> > a live view of infoset/data. But I'm not even sure most DAP friendly
>> > IDE's could support this kindof console output. Does anyone have
>> > familiarity with DAP IDE's or and what kinds of console capabilities are
>> > available?
>> >
>> > I also looked into TUI libraries with the idea that we could just extend
>> > our current debugger user interface to be a bit friendlier.
>> > Unfortunately, there aren't too many Java/Scala TUI libraries and those
>> > that do exist don't have Apache friendly licenses. We also want to be
>> > careful about increase dependencies just for a debugger than many people
>> > might not use, so large graphics libraries are probably out of the
>> question.
>> >
>> > This allo makes me wonder if an approach worth taking for the future of
>> > Daffodil schema debugging is developing a sort of "Daffodil Debug
>> > Protocol". I imagine it would be loosely based on DAP (which is
>> > essentially JSON message based) but could be targeted to the things that
>> > a DFDL schema debugger would really need. An added benefit with some
>> > sort of protocol is the debugger interface can be uncoupled from
>> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
>> > language/GUI framework and just have it communicate the protocol over
>> > some form of IPC. Another benefit is that any future backends could
>> > implement this protocol and so a single debugger could hook into
>> > different backends without much issue. Unfortunately, defining such a
>> > protocol might be a large task, but we do have our existing debug
>> > infrastructure and things like DAP to guide its development/design.
>> >
>> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps we
>> > really just need the few improvements mentioned to the existing
>> > debugger. Is that enough to make it usable? Or is an entirely different
>> > approach needed to debugging schemas?
>> >
>>
>>

Re: The future of the daffodil DFDL schema debugger?

Posted by John Wass <jw...@gmail.com>.
Revisiting this post after doing some debugger related work and thinking
about debug protocol/adapters to connect external tooling to the debug
process.

This comment is good

> This allo makes me wonder if an approach worth taking for the future of
Daffodil schema debugging is developing a sort of "Daffodil Debug Protocol".
I imagine it would be loosely based on DAP (which is  essentially JSON
message based) but could be targeted to the things that a DFDL schema
debugger would really need. An added benefit with some  sort of protocol is
the debugger interface can be uncoupled from Daffodil itself, so we could
implement a TUI/GUI/whatever in any  language/GUI framework and just have
it communicate the protocol over some form of IPC. Another benefit is that
any future backends could implement this protocol and so a single debugger
could hook into different backends without much issue. Unfortunately,
defining such a protocol might be a large task, but we do have our existing
debug infrastructure and things like DAP to guide its development/design.

Some thoughts on this
- Defining the protocol will be a large task, but a minimal version should
get up and round tripping quickly with a minimal subset of the protocol.
- The new protocol being informed by existing debugger and DAPis key
- Uncoupling from Daffodil is key
- Adapt the Daffodil protocol to produce DAP after the fact so as not to
constrain Daffodil debugging capability
- We dont need to tie the protocol or adapters to a single framework,
implementations of the IO layer should be simple enough to support multiple
things (eg Akka, Zio, "basic" ...)
- The current debugger lives in runtime1, but can we make an abstract API
that any runtime would implement?

Maybe a solution is structured like this
- daffodil-debug-api:
  - protocol model
  - interfaces: debugger / IO adapter / etc
  - lives in daffodil repo (new subproject?)
- daffodil-debug-io-NAME
  - provides implementation of a specific IO adapter
  - multiple projects possible (daffodil-debugger-akka,
daffodil-debugger-zio, etc)
  - supported ones live in their own subprojects, but other can be plugged
in from external sources
  - ability to support multiple implementations reduces risk of lock-in
- debugger applications
  - maintained in external repositories
  - depending on the IO implementation these could execute be in separate
process or on separate machine
  - like Steve said, could be any language / framework

Three types of reference implementations / sample applications could also
guide the development of the API
  1. a replacement for the existing TUI debugger, expected to end up with
at minimum the same functionality as the current one.
  2. a standalone GUI (JavaFX, Scala.js, ..) debugger
  3. an IDE integration

Thoughts?

Also I'm working on some reference implementations of these concepts using
Akka and Zio.  Not quite ready to talk through it yet, but the code is here
https://github.com/jw3/example-daffodil-debug



On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <sl...@apache.org> wrote:

> Yep, something like that seems very reasonable for dealing with large
> infosets. But it still feels like we still run into usability issues.
> For example, what if a user wants to see more? We need some
> configuration options to increase what we've ellided. It's not big, but
> every new thing that needs configuration adds complexity and decreases
> usability.
>
> And I think the only reason we are trying to spend effort elliding
> things is because we're limited to this gdb-like interface where you can
> only print out a little information at a time.
>
> I think what would really is to dump this gdb interface and instead use
> multiple windows/views. As a really close example to what I imagine, I
> recently came across this hex editor:
>
> https://www.synalysis.net/
>
> The screenshots are a bit small so it's not super clear, but this tool
> has one view for the data in hex, and one view for a tree of parsed
> results (which is very similar to our infoset). The "infoset" view has
> information like offset/length/value, and can be related back to the
> data view to find the actual bits.
>
> I imagine the "next generation daffodil debugger" to look much like
> this. As data is parsed, the infoset view fills up. This view could act
> like a standard GUI tree so you could collapse sections or scroll around
> to show just the parts you care about, and have search capabilities to
> quickly jump around. The advantage here is you no longer really need
> automated eliding or heuristics for what the user *might* care about.
> You just show the whole thing and let user scroll around. As daffodil
> parses and backtracks, this tree grows or shrinks.
>
> I also imagine you could have a cursor moving around the hex view, so as
> daffodil moves around (e.g. scanning for delimiters, extracting
> integers), one could update this data view to show what daffodil is
> doing and where it is.
>
> I also image there could be other views as well. For example, a schema
> view to show where in the schema daffodil is, and to add/remove
> breakpoints. And an information view for things like variables, in-scope
> delimiters, PoU's, etc.
>
> The only reason I mention a debug protcol is that would allow this GUI
> to be more easily written in something other that Java/Scala to take
> advantage of other GUI toolkits. It's been a long while since I've done
> anything with Java guis, but they seems pretty poor that last I looked
> at them. Would even allow for a TUI, which Java has little/no support
> for. Also enables things like remote deubgging if an socket IPC was
> used. Though I'm not sure all of that is necessary. Just thinking what
> would be ideal, and it can always be pared back.
>
>
> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> > I don't think of it as a daffodil debug protocol, but just a separation
> of concerns between display of information and the behaviors of
> parse/unparse that need to be points where users can pause, and data
> structures available to display.
> >
> > E.g., it is 100% a display issue that the infoset (shown as XML) is
> clumsy, too big, etc.  The infoset is available in the processor state, and
> one can examine the current node, enclosing node, prior sibling(s),
> following sibling(s), etc. One can elide contents that are too big for
> hexBinary, etc.
> >
> > I think this problem, how to display the infoset with sensible limits on
> sizing, is fairly easy to come up with some design for, that will at least
> be (1) always fairly small (2) much more useful in more cases. It won't be
> perfect but can be much better than what we do now.
> >
> > One sensible display "mode" should be that displaying the context
> surrounding the current element (when parsing or unparsing) displays at
> most N-lines. (N/2 before, N/2 after) with a maximum length of L characters
> (settable within reason ?)
> >
> > Sibling and enclosing nodes would be displayed eliding their contents to
> at most 1 line.
> >
> > Here's an example of what I mean. Displaying up to M=10 lines total:
> >
> > ...
> > <enclosingParent1>
> >    ...
> >    <priorSibling2>89ab782 ...</...>
> >    <priorSibling1>some text is here and some more text</...>
> >    <currentNode>value might be some big thing which needs to be elided
> ...</...>
> >    <followingSibling1> ... </...>
> >    ???
> > </enclosingParent1>
> > ???
> >
> > The </...> is just an idea to reduce XML matching end-tag clutter.
> >
> > The ... on a line alone or where element content would appear generally
> means 1 or more other siblings. The way the display above starts with ...
> means that this is a relative inner nest, not starting from the absolute
> root.
> >
> > The ... within simple content means that content is elided to fit on one
> line. Always follows some text characters to differentiate from the
> child-element context.
> >
> > The ??? means zero or more other siblings.
> >
> > I used bold italic above to point out that the current node would be
> highlighted somehow. Probably a way to do this that doesn't require display
> modes would be useful. E.g., a text marker like ">>>" as in:
> >
> >>>> <currentNode>value .... </...>
> >
> > might be better, particularly for a trace output being dumped to a text
> file.
> >
> > I made the above example an unparser kind of example by showing a
> following sibling that exists that is after the current node.
> >
> > I think the key concept is that any sibling node is displayed in a way
> that fits on one line.
> > E.g., even if the element name was really long, I'd suggest:
> >
> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> >
> > Where the element name itself gets elided because it is too long.
> >
> > A thought. Note that the above presentation is shown as quasi-XML, but
> there's nothing XML-specific about it. A JSON-friendly equivalent could be
> done as well:
> >
> > enclosingParent1 = {
> >    ...
> >    priorSibling2 = "89ab782..."
> >    priorSibling1 = "some text is here and some more text"
> >    currentNode = "value might be some big thing which needs to be elided
> ..."
> >    followingSibling1 = { ... }
> >    ???
> > }
> >
> > That's enough for 1 email thread on this debug topic.
> >
> >
> > ________________________________
> > From: Steve Lawrence <sl...@apache.org>
> > Sent: Tuesday, January 5, 2021 2:26 PM
> > To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> > Subject: The future of the daffodil DFDL schema debugger?
> >
> >
> > Now that we're in a new year, I'd like to start a discussion about the
> > Daffodil DFDL Schema debugger and how it might be improved to be more
> > useful.
> >
> > Note that this is not the capabilities to debug Daffodil itself in
> > something like Eclipse/IntelliJ, but the ability for Daffodil to provide
> > enough extra information during a parse/unparse so that a schema
> > developer can get an idea of what Daffodil is doing. This makes it
> > easier for users (rather than developers) to determine why a schema
> > isn't giving the expect parse/unparse result (either because of bad data
> > or a faulty schema.
> >
> > The current state of the debugger is enabled by providing the --debug or
> > --trace flags in the CLI. More information about that here:
> >
> > https://daffodil.apache.org/debugger/
> >
> > This enables a TUI and commands somewhat similar to GDB, providing thins
> > like breakpoints, steps, displaying the current infoset, display a dump
> > of the data, etc.
> >
> > Although I find this tool pretty useful, it definitely has some glaring
> > issues.
> >
> > The most glaring to me is that it really isn't useful at all for
> > debugging unparse. The data dumps only include then main outputstream,
> > so determine things like suspensions and buffered output is impossible.
> >
> > Another issue is the infoset output. When outputting the infoset, the
> > debugger currently just walks the entire thing and converts it to XML
> > and displays the XML. For large infosets, this is excess and can make it
> > impossible to use, even with some configurations the limit how much of
> > that infoset is actually printed to the screen. Also things like large
> > hex binary blobs create excessive and unusable output.
> >
> > Another thing I feel is missing is a schema view. Right now it's very
> > difficult to know where in the schema Daffodil actually is.
> >
> > I think these issues just need some thought improvement. One could
> > imagine a better way to stringify our unparse buffers for debug. One
> > could image a way to receive infoset state changes so the debugger can
> > track things like backtracks and remove infosets. One could image a way
> > display the schema
> >
> > We just need a better way to stringify the current state of the unparse
> > data including buffers, and we need a way to for the debugger to receive
> > state change information about infoset so it can update displays rather
> > than just constantly printing the entire infoset.
> >
> > However, I think another other big issue is just usability in general. I
> > think the CLI usage is reasonable, but it's not always user friendly,
> > and is difficult to view multiple things at the same time. I think
> > because of this very few people even use this tool. So this this like
> > perhaps something worth focus.
> >
> > My first thought to improving this usability issue would be to implement
> > the Debug Adapter Protocol (DAP)
> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
> > which many IDE's implement. With this implemented, Daffodil could be
> > plugged in to any IDE that supports it and essentially get debugging for
> > free, without the need to worry about the GUI elements.
> >
> > I do have concerns that this just wouldn't have enough functionality
> > that we'd really need. For example, DAP really only has ability show
> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
> > show a live view of the infoset or data. Most DAP IDE's do have a
> > console output, so we could potentially make it so the console output is
> > a live view of infoset/data. But I'm not even sure most DAP friendly
> > IDE's could support this kindof console output. Does anyone have
> > familiarity with DAP IDE's or and what kinds of console capabilities are
> > available?
> >
> > I also looked into TUI libraries with the idea that we could just extend
> > our current debugger user interface to be a bit friendlier.
> > Unfortunately, there aren't too many Java/Scala TUI libraries and those
> > that do exist don't have Apache friendly licenses. We also want to be
> > careful about increase dependencies just for a debugger than many people
> > might not use, so large graphics libraries are probably out of the
> question.
> >
> > This allo makes me wonder if an approach worth taking for the future of
> > Daffodil schema debugging is developing a sort of "Daffodil Debug
> > Protocol". I imagine it would be loosely based on DAP (which is
> > essentially JSON message based) but could be targeted to the things that
> > a DFDL schema debugger would really need. An added benefit with some
> > sort of protocol is the debugger interface can be uncoupled from
> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
> > language/GUI framework and just have it communicate the protocol over
> > some form of IPC. Another benefit is that any future backends could
> > implement this protocol and so a single debugger could hook into
> > different backends without much issue. Unfortunately, defining such a
> > protocol might be a large task, but we do have our existing debug
> > infrastructure and things like DAP to guide its development/design.
> >
> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps we
> > really just need the few improvements mentioned to the existing
> > debugger. Is that enough to make it usable? Or is an entirely different
> > approach needed to debugging schemas?
> >
>
>

Re: The future of the daffodil DFDL schema debugger?

Posted by Steve Lawrence <sl...@apache.org>.
Yep, something like that seems very reasonable for dealing with large
infosets. But it still feels like we still run into usability issues.
For example, what if a user wants to see more? We need some
configuration options to increase what we've ellided. It's not big, but
every new thing that needs configuration adds complexity and decreases
usability.

And I think the only reason we are trying to spend effort elliding
things is because we're limited to this gdb-like interface where you can
only print out a little information at a time.

I think what would really is to dump this gdb interface and instead use
multiple windows/views. As a really close example to what I imagine, I
recently came across this hex editor:

https://www.synalysis.net/

The screenshots are a bit small so it's not super clear, but this tool
has one view for the data in hex, and one view for a tree of parsed
results (which is very similar to our infoset). The "infoset" view has
information like offset/length/value, and can be related back to the
data view to find the actual bits.

I imagine the "next generation daffodil debugger" to look much like
this. As data is parsed, the infoset view fills up. This view could act
like a standard GUI tree so you could collapse sections or scroll around
to show just the parts you care about, and have search capabilities to
quickly jump around. The advantage here is you no longer really need
automated eliding or heuristics for what the user *might* care about.
You just show the whole thing and let user scroll around. As daffodil
parses and backtracks, this tree grows or shrinks.

I also imagine you could have a cursor moving around the hex view, so as
daffodil moves around (e.g. scanning for delimiters, extracting
integers), one could update this data view to show what daffodil is
doing and where it is.

I also image there could be other views as well. For example, a schema
view to show where in the schema daffodil is, and to add/remove
breakpoints. And an information view for things like variables, in-scope
delimiters, PoU's, etc.

The only reason I mention a debug protcol is that would allow this GUI
to be more easily written in something other that Java/Scala to take
advantage of other GUI toolkits. It's been a long while since I've done
anything with Java guis, but they seems pretty poor that last I looked
at them. Would even allow for a TUI, which Java has little/no support
for. Also enables things like remote deubgging if an socket IPC was
used. Though I'm not sure all of that is necessary. Just thinking what
would be ideal, and it can always be pared back.


On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> I don't think of it as a daffodil debug protocol, but just a separation of concerns between display of information and the behaviors of parse/unparse that need to be points where users can pause, and data structures available to display.
> 
> E.g., it is 100% a display issue that the infoset (shown as XML) is clumsy, too big, etc.  The infoset is available in the processor state, and one can examine the current node, enclosing node, prior sibling(s), following sibling(s), etc. One can elide contents that are too big for hexBinary, etc.
> 
> I think this problem, how to display the infoset with sensible limits on sizing, is fairly easy to come up with some design for, that will at least be (1) always fairly small (2) much more useful in more cases. It won't be perfect but can be much better than what we do now.
> 
> One sensible display "mode" should be that displaying the context surrounding the current element (when parsing or unparsing) displays at most N-lines. (N/2 before, N/2 after) with a maximum length of L characters (settable within reason ?)
> 
> Sibling and enclosing nodes would be displayed eliding their contents to at most 1 line.
> 
> Here's an example of what I mean. Displaying up to M=10 lines total:
> 
> ...
> <enclosingParent1>
>    ...
>    <priorSibling2>89ab782 ...</...>
>    <priorSibling1>some text is here and some more text</...>
>    <currentNode>value might be some big thing which needs to be elided ...</...>
>    <followingSibling1> ... </...>
>    ???
> </enclosingParent1>
> ???
> 
> The </...> is just an idea to reduce XML matching end-tag clutter.
> 
> The ... on a line alone or where element content would appear generally means 1 or more other siblings. The way the display above starts with ... means that this is a relative inner nest, not starting from the absolute root.
> 
> The ... within simple content means that content is elided to fit on one line. Always follows some text characters to differentiate from the child-element context.
> 
> The ??? means zero or more other siblings.
> 
> I used bold italic above to point out that the current node would be highlighted somehow. Probably a way to do this that doesn't require display modes would be useful. E.g., a text marker like ">>>" as in:
> 
>>>> <currentNode>value .... </...>
> 
> might be better, particularly for a trace output being dumped to a text file.
> 
> I made the above example an unparser kind of example by showing a following sibling that exists that is after the current node.
> 
> I think the key concept is that any sibling node is displayed in a way that fits on one line.
> E.g., even if the element name was really long, I'd suggest:
> 
>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> 
> Where the element name itself gets elided because it is too long.
> 
> A thought. Note that the above presentation is shown as quasi-XML, but there's nothing XML-specific about it. A JSON-friendly equivalent could be done as well:
> 
> enclosingParent1 = {
>    ...
>    priorSibling2 = "89ab782..."
>    priorSibling1 = "some text is here and some more text"
>    currentNode = "value might be some big thing which needs to be elided ..."
>    followingSibling1 = { ... }
>    ???
> }
> 
> That's enough for 1 email thread on this debug topic.
> 
> 
> ________________________________
> From: Steve Lawrence <sl...@apache.org>
> Sent: Tuesday, January 5, 2021 2:26 PM
> To: dev@daffodil.apache.org <de...@daffodil.apache.org>
> Subject: The future of the daffodil DFDL schema debugger?
> 
> 
> Now that we're in a new year, I'd like to start a discussion about the
> Daffodil DFDL Schema debugger and how it might be improved to be more
> useful.
> 
> Note that this is not the capabilities to debug Daffodil itself in
> something like Eclipse/IntelliJ, but the ability for Daffodil to provide
> enough extra information during a parse/unparse so that a schema
> developer can get an idea of what Daffodil is doing. This makes it
> easier for users (rather than developers) to determine why a schema
> isn't giving the expect parse/unparse result (either because of bad data
> or a faulty schema.
> 
> The current state of the debugger is enabled by providing the --debug or
> --trace flags in the CLI. More information about that here:
> 
> https://daffodil.apache.org/debugger/
> 
> This enables a TUI and commands somewhat similar to GDB, providing thins
> like breakpoints, steps, displaying the current infoset, display a dump
> of the data, etc.
> 
> Although I find this tool pretty useful, it definitely has some glaring
> issues.
> 
> The most glaring to me is that it really isn't useful at all for
> debugging unparse. The data dumps only include then main outputstream,
> so determine things like suspensions and buffered output is impossible.
> 
> Another issue is the infoset output. When outputting the infoset, the
> debugger currently just walks the entire thing and converts it to XML
> and displays the XML. For large infosets, this is excess and can make it
> impossible to use, even with some configurations the limit how much of
> that infoset is actually printed to the screen. Also things like large
> hex binary blobs create excessive and unusable output.
> 
> Another thing I feel is missing is a schema view. Right now it's very
> difficult to know where in the schema Daffodil actually is.
> 
> I think these issues just need some thought improvement. One could
> imagine a better way to stringify our unparse buffers for debug. One
> could image a way to receive infoset state changes so the debugger can
> track things like backtracks and remove infosets. One could image a way
> display the schema
> 
> We just need a better way to stringify the current state of the unparse
> data including buffers, and we need a way to for the debugger to receive
> state change information about infoset so it can update displays rather
> than just constantly printing the entire infoset.
> 
> However, I think another other big issue is just usability in general. I
> think the CLI usage is reasonable, but it's not always user friendly,
> and is difficult to view multiple things at the same time. I think
> because of this very few people even use this tool. So this this like
> perhaps something worth focus.
> 
> My first thought to improving this usability issue would be to implement
> the Debug Adapter Protocol (DAP)
> (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
> which many IDE's implement. With this implemented, Daffodil could be
> plugged in to any IDE that supports it and essentially get debugging for
> free, without the need to worry about the GUI elements.
> 
> I do have concerns that this just wouldn't have enough functionality
> that we'd really need. For example, DAP really only has ability show
> code (Daffodil's equivalent is the DFDL schema). There isn't a way to
> show a live view of the infoset or data. Most DAP IDE's do have a
> console output, so we could potentially make it so the console output is
> a live view of infoset/data. But I'm not even sure most DAP friendly
> IDE's could support this kindof console output. Does anyone have
> familiarity with DAP IDE's or and what kinds of console capabilities are
> available?
> 
> I also looked into TUI libraries with the idea that we could just extend
> our current debugger user interface to be a bit friendlier.
> Unfortunately, there aren't too many Java/Scala TUI libraries and those
> that do exist don't have Apache friendly licenses. We also want to be
> careful about increase dependencies just for a debugger than many people
> might not use, so large graphics libraries are probably out of the question.
> 
> This allo makes me wonder if an approach worth taking for the future of
> Daffodil schema debugging is developing a sort of "Daffodil Debug
> Protocol". I imagine it would be loosely based on DAP (which is
> essentially JSON message based) but could be targeted to the things that
> a DFDL schema debugger would really need. An added benefit with some
> sort of protocol is the debugger interface can be uncoupled from
> Daffodil itself, so we could implement a TUI/GUI/whatever in any
> language/GUI framework and just have it communicate the protocol over
> some form of IPC. Another benefit is that any future backends could
> implement this protocol and so a single debugger could hook into
> different backends without much issue. Unfortunately, defining such a
> protocol might be a large task, but we do have our existing debug
> infrastructure and things like DAP to guide its development/design.
> 
> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps we
> really just need the few improvements mentioned to the existing
> debugger. Is that enough to make it usable? Or is an entirely different
> approach needed to debugging schemas?
> 


Re: The future of the daffodil DFDL schema debugger?

Posted by "Beckerle, Mike" <mb...@owlcyberdefense.com>.
I don't think of it as a daffodil debug protocol, but just a separation of concerns between display of information and the behaviors of parse/unparse that need to be points where users can pause, and data structures available to display.

E.g., it is 100% a display issue that the infoset (shown as XML) is clumsy, too big, etc.  The infoset is available in the processor state, and one can examine the current node, enclosing node, prior sibling(s), following sibling(s), etc. One can elide contents that are too big for hexBinary, etc.

I think this problem, how to display the infoset with sensible limits on sizing, is fairly easy to come up with some design for, that will at least be (1) always fairly small (2) much more useful in more cases. It won't be perfect but can be much better than what we do now.

One sensible display "mode" should be that displaying the context surrounding the current element (when parsing or unparsing) displays at most N-lines. (N/2 before, N/2 after) with a maximum length of L characters (settable within reason ?)

Sibling and enclosing nodes would be displayed eliding their contents to at most 1 line.

Here's an example of what I mean. Displaying up to M=10 lines total:

...
<enclosingParent1>
   ...
   <priorSibling2>89ab782 ...</...>
   <priorSibling1>some text is here and some more text</...>
   <currentNode>value might be some big thing which needs to be elided ...</...>
   <followingSibling1> ... </...>
   ???
</enclosingParent1>
???

The </...> is just an idea to reduce XML matching end-tag clutter.

The ... on a line alone or where element content would appear generally means 1 or more other siblings. The way the display above starts with ... means that this is a relative inner nest, not starting from the absolute root.

The ... within simple content means that content is elided to fit on one line. Always follows some text characters to differentiate from the child-element context.

The ??? means zero or more other siblings.

I used bold italic above to point out that the current node would be highlighted somehow. Probably a way to do this that doesn't require display modes would be useful. E.g., a text marker like ">>>" as in:

>>> <currentNode>value .... </...>

might be better, particularly for a trace output being dumped to a text file.

I made the above example an unparser kind of example by showing a following sibling that exists that is after the current node.

I think the key concept is that any sibling node is displayed in a way that fits on one line.
E.g., even if the element name was really long, I'd suggest:

  <hereIsAnElementWithASuperLongName...>abcd ... </...>

Where the element name itself gets elided because it is too long.

A thought. Note that the above presentation is shown as quasi-XML, but there's nothing XML-specific about it. A JSON-friendly equivalent could be done as well:

enclosingParent1 = {
   ...
   priorSibling2 = "89ab782..."
   priorSibling1 = "some text is here and some more text"
   currentNode = "value might be some big thing which needs to be elided ..."
   followingSibling1 = { ... }
   ???
}

That's enough for 1 email thread on this debug topic.


________________________________
From: Steve Lawrence <sl...@apache.org>
Sent: Tuesday, January 5, 2021 2:26 PM
To: dev@daffodil.apache.org <de...@daffodil.apache.org>
Subject: The future of the daffodil DFDL schema debugger?


Now that we're in a new year, I'd like to start a discussion about the
Daffodil DFDL Schema debugger and how it might be improved to be more
useful.

Note that this is not the capabilities to debug Daffodil itself in
something like Eclipse/IntelliJ, but the ability for Daffodil to provide
enough extra information during a parse/unparse so that a schema
developer can get an idea of what Daffodil is doing. This makes it
easier for users (rather than developers) to determine why a schema
isn't giving the expect parse/unparse result (either because of bad data
or a faulty schema.

The current state of the debugger is enabled by providing the --debug or
--trace flags in the CLI. More information about that here:

https://daffodil.apache.org/debugger/

This enables a TUI and commands somewhat similar to GDB, providing thins
like breakpoints, steps, displaying the current infoset, display a dump
of the data, etc.

Although I find this tool pretty useful, it definitely has some glaring
issues.

The most glaring to me is that it really isn't useful at all for
debugging unparse. The data dumps only include then main outputstream,
so determine things like suspensions and buffered output is impossible.

Another issue is the infoset output. When outputting the infoset, the
debugger currently just walks the entire thing and converts it to XML
and displays the XML. For large infosets, this is excess and can make it
impossible to use, even with some configurations the limit how much of
that infoset is actually printed to the screen. Also things like large
hex binary blobs create excessive and unusable output.

Another thing I feel is missing is a schema view. Right now it's very
difficult to know where in the schema Daffodil actually is.

I think these issues just need some thought improvement. One could
imagine a better way to stringify our unparse buffers for debug. One
could image a way to receive infoset state changes so the debugger can
track things like backtracks and remove infosets. One could image a way
display the schema

We just need a better way to stringify the current state of the unparse
data including buffers, and we need a way to for the debugger to receive
state change information about infoset so it can update displays rather
than just constantly printing the entire infoset.

However, I think another other big issue is just usability in general. I
think the CLI usage is reasonable, but it's not always user friendly,
and is difficult to view multiple things at the same time. I think
because of this very few people even use this tool. So this this like
perhaps something worth focus.

My first thought to improving this usability issue would be to implement
the Debug Adapter Protocol (DAP)
(https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
which many IDE's implement. With this implemented, Daffodil could be
plugged in to any IDE that supports it and essentially get debugging for
free, without the need to worry about the GUI elements.

I do have concerns that this just wouldn't have enough functionality
that we'd really need. For example, DAP really only has ability show
code (Daffodil's equivalent is the DFDL schema). There isn't a way to
show a live view of the infoset or data. Most DAP IDE's do have a
console output, so we could potentially make it so the console output is
a live view of infoset/data. But I'm not even sure most DAP friendly
IDE's could support this kindof console output. Does anyone have
familiarity with DAP IDE's or and what kinds of console capabilities are
available?

I also looked into TUI libraries with the idea that we could just extend
our current debugger user interface to be a bit friendlier.
Unfortunately, there aren't too many Java/Scala TUI libraries and those
that do exist don't have Apache friendly licenses. We also want to be
careful about increase dependencies just for a debugger than many people
might not use, so large graphics libraries are probably out of the question.

This allo makes me wonder if an approach worth taking for the future of
Daffodil schema debugging is developing a sort of "Daffodil Debug
Protocol". I imagine it would be loosely based on DAP (which is
essentially JSON message based) but could be targeted to the things that
a DFDL schema debugger would really need. An added benefit with some
sort of protocol is the debugger interface can be uncoupled from
Daffodil itself, so we could implement a TUI/GUI/whatever in any
language/GUI framework and just have it communicate the protocol over
some form of IPC. Another benefit is that any future backends could
implement this protocol and so a single debugger could hook into
different backends without much issue. Unfortunately, defining such a
protocol might be a large task, but we do have our existing debug
infrastructure and things like DAP to guide its development/design.

Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps we
really just need the few improvements mentioned to the existing
debugger. Is that enough to make it usable? Or is an entirely different
approach needed to debugging schemas?