You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@corinthia.apache.org by Peter Kelly <pm...@apache.org> on 2015/01/03 13:02:56 UTC

My plans for January

Inspired by Jan’s excellent idea of posting what we each plan to work on, I thought I’d chip in with my intentions:

- Complete development of a generic parser library based on Parsing Expression Grammars [1,2], which will serve as a basis for parsing non-XML based file formats like Markdown, AsciiDoc, reStructuredText, and RTF. This is something I’ve been dabbling with on and off for about a year now, and have recently done a complete rewrite of. I also forsee potential in extending this into a high-level programming language for expressing transformations similar to XSLT or Stratego/XT [3], but that’s something for a little further down the track.

I’ll put this code in a separate, experimental branch once it’s in a vaguely reasonable state - Real Soon Now (TM).

- Implement parsers for XML and HTML. Theoretically this could be done with the PEG-based parser above, but will be quicker and easier to do “manually”, as neither are very complicated to do. This will allow us to remove the external dependencies on libxml2, iconv, and htmltidy. I’ll likely actually do this first, given that it’s the easiest.

Note that given these dependencies will shortly be going away, I recommend against trying to isolate them in platform, as doing so will likely be more effort than writing the parsers themselves due to the dependencies on data structures used in core (specifically the DOM classes), which aren’t accessible from platform.

- Document more of the code base. This will include coding conventions - how things like error handling, memory management, and string representation/manipulation are carried out by the library. It will also cover the core classes and parts of the existing Word filter.

For those of you interested in formal language theory and parsing techniques, I recommend reading [4] which describes some of the history and recent developments such as packrat parsing which make for practical and simpler implementations of parsers for a more general range of languages than handled by LL/LR grammars of old. Flex and Bison users in particular should find this a relieving read :)

[1] Bryan Ford: Parsing expression grammars: a recognition-based syntactic foundation. POPL 2004: 111-122. http://bford.info/pub/lang/peg.pdf

[2] Bryan Ford: Packrat parsing: : simple, powerful, lazy, linear time, functional pearl. ICFP 2002: 36-47. http://bford.info/pub/lang/packrat-icfp02.pdf

[3] http://strategoxt.org

[4] Lennart C. L. Kats, Eelco Visser, Guido Wachsmuth: Pure and declarative syntax definition: paradise lost and regained. OOPSLA 2010: 918-932. http://swerl.tudelft.nl/twiki/pub/Main/TechnicalReports/TUD-SERG-2010-019.pdf

—
Dr Peter M. Kelly
pmkelly@apache.org

PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
(fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)


Re: My plans for January

Posted by jan i <ja...@apache.org>.
On 3 January 2015 at 13:02, Peter Kelly <pm...@apache.org> wrote:

> Inspired by Jan’s excellent idea of posting what we each plan to work on,
> I thought I’d chip in with my intentions:
>
> - Complete development of a generic parser library based on Parsing
> Expression Grammars [1,2], which will serve as a basis for parsing non-XML
> based file formats like Markdown, AsciiDoc, reStructuredText, and RTF. This
> is something I’ve been dabbling with on and off for about a year now, and
> have recently done a complete rewrite of. I also forsee potential in
> extending this into a high-level programming language for expressing
> transformations similar to XSLT or Stratego/XT [3], but that’s something
> for a little further down the track.
>
I like the idea especially after having read up on  Stratego/XT. However we
still need at some point to discuss how we store information internally,
and how filters can access this information.


>
> I’ll put this code in a separate, experimental branch once it’s in a
> vaguely reasonable state - Real Soon Now (TM).
>
> - Implement parsers for XML and HTML. Theoretically this could be done
> with the PEG-based parser above, but will be quicker and easier to do
> “manually”, as neither are very complicated to do. This will allow us to
> remove the external dependencies on libxml2, iconv, and htmltidy. I’ll
> likely actually do this first, given that it’s the easiest.
>
+1 I would really see those go away.

>
> Note that given these dependencies will shortly be going away, I recommend
> against trying to isolate them in platform, as doing so will likely be more
> effort than writing the parsers themselves due to the dependencies on data
> structures used in core (specifically the DOM classes), which aren’t
> accessible from platform.
>
Agreed, not in my current plans anyhow.


>
> - Document more of the code base. This will include coding conventions -
> how things like error handling, memory management, and string
> representation/manipulation are carried out by the library. It will also
> cover the core classes and parts of the existing Word filter.
>
Coding conventions would be real nice to have as a policy web page. I am
working with dorte on a couple of extensions to our web, so if you can make
the raw text, then dorte can change drawings etc. into the responsive
design.


>
> For those of you interested in formal language theory and parsing
> techniques, I recommend reading [4] which describes some of the history and
> recent developments such as packrat parsing which make for practical and
> simpler implementations of parsers for a more general range of languages
> than handled by LL/LR grammars of old. Flex and Bison users in particular
> should find this a relieving read :)
>
> [1] Bryan Ford: Parsing expression grammars: a recognition-based syntactic
> foundation. POPL 2004: 111-122. http://bford.info/pub/lang/peg.pdf
>
> [2] Bryan Ford: Packrat parsing: : simple, powerful, lazy, linear time,
> functional pearl. ICFP 2002: 36-47.
> http://bford.info/pub/lang/packrat-icfp02.pdf
>
> [3] http://strategoxt.org
>
> [4] Lennart C. L. Kats, Eelco Visser, Guido Wachsmuth: Pure and
> declarative syntax definition: paradise lost and regained. OOPSLA 2010:
> 918-932.
> http://swerl.tudelft.nl/twiki/pub/Main/TechnicalReports/TUD-SERG-2010-019.pdf
>

rgds
jan i.


>
> —
> Dr Peter M. Kelly
> pmkelly@apache.org
>
> PGP key: http://www.kellypmk.net/pgp-key <http://www.kellypmk.net/pgp-key>
> (fingerprint 5435 6718 59F0 DD1F BFA0 5E46 2523 BAA1 44AE 2966)
>
>