You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@freemarker.apache.org by Stephan Müller <st...@notatoaster.org> on 2018/08/05 16:58:11 UTC

Re: Anybody interested in some FM3 parser research?

Am 04.07.2018 um 19:28 schrieb Daniel Dekany:
> I wonder what parser libraries could help us, in FM3, to separate the
> expression language parsing from the top-level language (like
> `<#foo>`, `${...}`, etc.) parsing. Or if a hand written parsers is an
> acceptable compromise. It would be good if we can change the top-level
> syntax and still reuse the expression syntax. (Or, replace the
> expression syntax, and reuse the top-level one.) Like, somebody wants
> a syntax like `#foo(exp)` instead of `<#foo exp>`, but still reuse the
> expression syntax. (For me it was always part of the FM3 agenda,
> though might will be proven to be too much...)
> [..]

During the last days I had a high-level look at different parser 
generators, and as one might imagine, there are a lot of parser 
generators, with different licenses, different maturities, different 
states of maintenance and so on.

Due to https://www.apache.org/legal/resolved.html I ignored all parser 
generators which may not be included in Apache projects because of their 
license, especially GNU GPL etc.

IMHO this leaves us with:

* LL(k) parsers: ANTLR, JavaCC and Grammatica
* LALR parsers: CookCC
* PEG parsers: Mouse
* parser combinators: jparsec, parboiled and PetitParser

This list is not exhaustive, so I probably forget some interesting 
projects. If so, please share, I'd like to have a look into these, too.

My idea for the next step: define a really small subset of FTL and try 
to implement PoCs for this subset with the candidates which I mentioned 
above.

The subset might be something like

* interpolations: ${..}
* directives: if, assign
* expressions: numbers, variables, +
* variants of the parsers with different delimiters
* split into two parsers (interpolations/directives vs. expression language)

What do you think?


Stephan.

P.S.: my more detailed list of parser generators can be found here: 
https://gist.github.com/chaquotay/8041096bad36f6f3f0d4166d6f8623b5

Re: Anybody interested in some FM3 parser research?

Posted by Daniel Dekany <dd...@apache.org>.

Tuesday, August 7, 2018, 10:42:02 AM, Angelo zerr wrote:

> Hi Daniel,
>
> Many thanks for working on this issue.

Actually it's Stephan who works on it at the moment. Also, note that
it's for FM3, not for FM2. Though FM3 will need IDE plugins as well.

> In my case, I'm waiting for "tolerant" parser feature to continue my
> work with Freemarker Language Server

Honestly, I doubt that we will be able to reuse this next generation
FM3 parser for FM2, so I'm still saying that the old plugin can't
recover either, and for now the new plugin only needs to beat the old
one. The production parser of FM2 must be very strictly backward
compatible (must emulate all the historical glitches, according the
incompatibleImprovemetns setting). Surely the parser for the IDE need
not be that accurate, and so with quite significant work, the next
generation FM3 parser could be backported to parse FM2. However, then
in the IDE-s it can only be used in additionally to the real FM2
parser, since you want to catch all parse errors that will pop up in
production.

Anyway, for now we should focus FM3, and then we will see better what
can be backported to FM2.

> https://github.com/angelozerr/freemarker-languageserver/ which uses
> a custom tolerant parser (which basicly parses XML). If you can more manage
> the capability to update an existing Freemarker DOM by a content (ex: user
> type space, user type a FM content in the editor), it should be fantastic.

Supporting user-defined dialects (a set of user defined directives and
functions that are resolved/validated during parsing, plus maybe
custom syntax) is a main goal of FM3, if that covers what you mean.

> It will avoid to reparse the full content of the editor to rebult the
> Freemarker DOM (incremental).
>
> In other words to support IDE, we need:
>
>  * tolerant parser (required)
>  * incremental parser (optional)

Indeed, incremental parsing is a point I have missed. Though it's
surely not a requirement as far the research done by Stephan is
concerned. It will be already a miracle if we find a library that can
address all the other wishes (while it's also fast enough).

Maybe the solution will be that only the expression parser will use a
lexer/parser generator, and the top-level language parser is hand
written so we have maximum flexibility. Expressions are usually short
and are always enclosed into top-level language constructs (i.e., into
`${}` or the arguments inside "FreeMarker tags"). When expression
parsing fails, we give up the parsing of the expression, but then find
the (suspected) end of the expression in the enclosing top-level
product with some hand written code (like we find the closing "}" of
the "${" that contains the malformed expression, intelligently
skipping string literals and such), and with that we are back to a
normal parsing state (we just have an error node inside that `${}`),
so we can continue parsing. So inside expressions we stop at the first
error, we aren't incremental, we do nothing fancy, but as this
simplistically parsed region ends at the end if the expression, it
sounds acceptable to me. The tricky part is the top-level language
parser (even in FM2 actually), where we want to continue after errors,
maybe we want incremental parsing, we want to do parse-time decision
based on the runtime provided Dialect, etc., and that's why a hand
written parser could be beneficial there.

If we go down on the above path, then a requirement will be that we
must be able to run many little independent expression parsings
without much overhead. (At a quick glance JavaCC can't do that. Or at
least we had to do some awkward hacks.)

> The Java JDT ICompilationUnit of Eclipse provides this feature. It's one
> reason why Java Editor completion, etc is so fast.
>
> Regard's Angelo
>
>
>
> 2018-08-07 1:59 GMT+02:00 Daniel Dekany <dd...@apache.org>:
>
>> Sunday, August 5, 2018, 6:58:11 PM, Stephan Müller wrote:
>>
>> > Am 04.07.2018 um 19:28 schrieb Daniel Dekany:
>> >> I wonder what parser libraries could help us, in FM3, to separate the
>> >> expression language parsing from the top-level language (like
>> >> `<#foo>`, `${...}`, etc.) parsing. Or if a hand written parsers is an
>> >> acceptable compromise. It would be good if we can change the top-level
>> >> syntax and still reuse the expression syntax. (Or, replace the
>> >> expression syntax, and reuse the top-level one.) Like, somebody wants
>> >> a syntax like `#foo(exp)` instead of `<#foo exp>`, but still reuse the
>> >> expression syntax. (For me it was always part of the FM3 agenda,
>> >> though might will be proven to be too much...)
>> >> [..]
>> >
>> > During the last days I had a high-level look at different parser
>> > generators, and as one might imagine, there are a lot of parser
>> > generators, with different licenses, different maturities, different
>> > states of maintenance and so on.
>> >
>> > Due to https://www.apache.org/legal/resolved.html I ignored all parser
>> > generators which may not be included in Apache projects because of their
>> > license, especially GNU GPL etc.
>> >
>> > IMHO this leaves us with:
>> >
>> > * LL(k) parsers: ANTLR, JavaCC and Grammatica
>> > * LALR parsers: CookCC
>> > * PEG parsers: Mouse
>> > * parser combinators: jparsec, parboiled and PetitParser
>> >
>> > This list is not exhaustive, so I probably forget some interesting
>> > projects. If so, please share, I'd like to have a look into these, too.
>> >
>> > My idea for the next step: define a really small subset of FTL and try
>> > to implement PoCs for this subset with the candidates which I mentioned
>> > above.
>> >
>> > The subset might be something like
>> >
>> > * interpolations: ${..}
>> > * directives: if, assign
>>
>> Just to be on the safe side, I will note that you shouldn't try to
>> hard-code parser logic that's specific to a directive (like "if").
>> Instead, you should try to parse an unified/generic directive call
>> syntax, and then invoke the Dialect to find out the further rules. And
>> that's tricky, as then the parser definition doesn't specify which
>> tags have an end-tag pair, and what can be nested between them, only
>> the Dialect knows that. Like, if you look at the current parser, it
>> basically says that "if" is like
>>
>>   "<#" "if" Expression ">" MixedContent "</#" "if" ">"
>>
>> which is expressive and all, but sadly it won't be possible in FM3 to
>> do it like that.
>>
>> > * expressions: numbers, variables, +
>> > * variants of the parsers with different delimiters
>> > * split into two parsers (interpolations/directives vs. expression
>> language)
>> >
>> > What do you think?
>>
>> I haven't used any parser library but JavaCC, so I have not tips
>> there. Otherwise the plan sounds good.
>>
>> Anyway, I kind of repeat myself here, but the expectations that may
>> filter down the candidates quickly:
>>
>> - Splitting into two parsers, of course
>>
>> - Maintainability of custom syntax variations (like new FreeMarker
>>   versions won't break them, or at least they need no manual work to
>>   regenerate them)
>>
>> - How parsing partially driven by the Dialect looks... it won't fit
>>   JavaCC well for example. (But, probably it won't be very nice with
>>   any of them.)
>>
>> In case multiple of the libraries stay alive, some further extras that
>> can decide:
>>
>> - More understandable/helpful error messages is a big plus.
>>
>> - It would be interesting to see how hard it is to write a parser that
>>   continues parsing after the first error, to catch more errors. This
>>   is mostly for IDE-s.
>>
>> > Stephan.
>> >
>> > P.S.: my more detailed list of parser generators can be found here:
>> > https://gist.github.com/chaquotay/8041096bad36f6f3f0d4166d6f8623b5
>>
>> --
>> Thanks,
>>  Daniel Dekany
>>
>>

-- 
Thanks,
 Daniel Dekany

Re: Anybody interested in some FM3 parser research?

Posted by Angelo zerr <an...@gmail.com>.

Hi Daniel,

Many thanks for working on this issue. In my case, I'm waiting for
"tolerant" parser feature to continue my work with Freemarker Language
Server https://github.com/angelozerr/freemarker-languageserver/ which uses
a custom tolerant parser (which basicly parses XML). If you can more manage
the capability to update an existing Freemarker DOM by a content (ex: user
type space, user type a FM content in the editor), it should be fantastic.
It will avoid to reparse the full content of the editor to rebult the
Freemarker DOM (incremental).

In other words to support IDE, we need:

 * tolerant parser (required)
 * incremental parser (optional)

The Java JDT ICompilationUnit of Eclipse provides this feature. It's one
reason why Java Editor completion, etc is so fast.

Regard's Angelo



2018-08-07 1:59 GMT+02:00 Daniel Dekany <dd...@apache.org>:

> Sunday, August 5, 2018, 6:58:11 PM, Stephan Müller wrote:
>
> > Am 04.07.2018 um 19:28 schrieb Daniel Dekany:
> >> I wonder what parser libraries could help us, in FM3, to separate the
> >> expression language parsing from the top-level language (like
> >> `<#foo>`, `${...}`, etc.) parsing. Or if a hand written parsers is an
> >> acceptable compromise. It would be good if we can change the top-level
> >> syntax and still reuse the expression syntax. (Or, replace the
> >> expression syntax, and reuse the top-level one.) Like, somebody wants
> >> a syntax like `#foo(exp)` instead of `<#foo exp>`, but still reuse the
> >> expression syntax. (For me it was always part of the FM3 agenda,
> >> though might will be proven to be too much...)
> >> [..]
> >
> > During the last days I had a high-level look at different parser
> > generators, and as one might imagine, there are a lot of parser
> > generators, with different licenses, different maturities, different
> > states of maintenance and so on.
> >
> > Due to https://www.apache.org/legal/resolved.html I ignored all parser
> > generators which may not be included in Apache projects because of their
> > license, especially GNU GPL etc.
> >
> > IMHO this leaves us with:
> >
> > * LL(k) parsers: ANTLR, JavaCC and Grammatica
> > * LALR parsers: CookCC
> > * PEG parsers: Mouse
> > * parser combinators: jparsec, parboiled and PetitParser
> >
> > This list is not exhaustive, so I probably forget some interesting
> > projects. If so, please share, I'd like to have a look into these, too.
> >
> > My idea for the next step: define a really small subset of FTL and try
> > to implement PoCs for this subset with the candidates which I mentioned
> > above.
> >
> > The subset might be something like
> >
> > * interpolations: ${..}
> > * directives: if, assign
>
> Just to be on the safe side, I will note that you shouldn't try to
> hard-code parser logic that's specific to a directive (like "if").
> Instead, you should try to parse an unified/generic directive call
> syntax, and then invoke the Dialect to find out the further rules. And
> that's tricky, as then the parser definition doesn't specify which
> tags have an end-tag pair, and what can be nested between them, only
> the Dialect knows that. Like, if you look at the current parser, it
> basically says that "if" is like
>
>   "<#" "if" Expression ">" MixedContent "</#" "if" ">"
>
> which is expressive and all, but sadly it won't be possible in FM3 to
> do it like that.
>
> > * expressions: numbers, variables, +
> > * variants of the parsers with different delimiters
> > * split into two parsers (interpolations/directives vs. expression
> language)
> >
> > What do you think?
>
> I haven't used any parser library but JavaCC, so I have not tips
> there. Otherwise the plan sounds good.
>
> Anyway, I kind of repeat myself here, but the expectations that may
> filter down the candidates quickly:
>
> - Splitting into two parsers, of course
>
> - Maintainability of custom syntax variations (like new FreeMarker
>   versions won't break them, or at least they need no manual work to
>   regenerate them)
>
> - How parsing partially driven by the Dialect looks... it won't fit
>   JavaCC well for example. (But, probably it won't be very nice with
>   any of them.)
>
> In case multiple of the libraries stay alive, some further extras that
> can decide:
>
> - More understandable/helpful error messages is a big plus.
>
> - It would be interesting to see how hard it is to write a parser that
>   continues parsing after the first error, to catch more errors. This
>   is mostly for IDE-s.
>
> > Stephan.
> >
> > P.S.: my more detailed list of parser generators can be found here:
> > https://gist.github.com/chaquotay/8041096bad36f6f3f0d4166d6f8623b5
>
> --
> Thanks,
>  Daniel Dekany
>
>

Re: Anybody interested in some FM3 parser research?

Posted by Stephan Müller <st...@notatoaster.org>.

Hi Daniel,

thanks for your reminder, and sorry that I didn't manage to answer earlier.

I have to confess that I hugely, massively misjudged the amount of time
I'm able to spent on this topic. So apart from my initial list of
possible candidates (which I shared some months ago) and one small
experiment with one library I didn't manage to evaluate anything in
greater detail. So no, right now I cannot provide any recommendation,
and I fear that I won't be able to do so any time soon.


Stephan.


Am 15.11.2018 um 21:15 schrieb Daniel Dekany:
> Did this activity lead to somewhere? ?For example, do you have a
> recommendation for the lexer/parser library to use, if we only want to
> handle the expression syntax with it (so starting a new lexing/parsing
> should have low overhead)?
> 
> 
> Tuesday, August 7, 2018, 12:59:14 AM, Daniel Dekany wrote:
> 
>> Sunday, August 5, 2018, 6:58:11 PM, Stephan Müller wrote:
>>
>>> Am 04.07.2018 um 19:28 schrieb Daniel Dekany:
>>>> I wonder what parser libraries could help us, in FM3, to separate the
>>>> expression language parsing from the top-level language (like
>>>> `<#foo>`, `${...}`, etc.) parsing. Or if a hand written parsers is an
>>>> acceptable compromise. It would be good if we can change the top-level
>>>> syntax and still reuse the expression syntax. (Or, replace the
>>>> expression syntax, and reuse the top-level one.) Like, somebody wants
>>>> a syntax like `#foo(exp)` instead of `<#foo exp>`, but still reuse the
>>>> expression syntax. (For me it was always part of the FM3 agenda,
>>>> though might will be proven to be too much...)
>>>> [..]
>>>
>>> During the last days I had a high-level look at different parser 
>>> generators, and as one might imagine, there are a lot of parser 
>>> generators, with different licenses, different maturities, different 
>>> states of maintenance and so on.
>>>
>>> Due to https://www.apache.org/legal/resolved.html I ignored all parser
>>> generators which may not be included in Apache projects because of their
>>> license, especially GNU GPL etc.
>>>
>>> IMHO this leaves us with:
>>>
>>> * LL(k) parsers: ANTLR, JavaCC and Grammatica
>>> * LALR parsers: CookCC
>>> * PEG parsers: Mouse
>>> * parser combinators: jparsec, parboiled and PetitParser
>>>
>>> This list is not exhaustive, so I probably forget some interesting 
>>> projects. If so, please share, I'd like to have a look into these, too.
>>>
>>> My idea for the next step: define a really small subset of FTL and try
>>> to implement PoCs for this subset with the candidates which I mentioned
>>> above.
>>>
>>> The subset might be something like
>>>
>>> * interpolations: ${..}
>>> * directives: if, assign
>>
>> Just to be on the safe side, I will note that you shouldn't try to
>> hard-code parser logic that's specific to a directive (like "if").
>> Instead, you should try to parse an unified/generic directive call
>> syntax, and then invoke the Dialect to find out the further rules. And
>> that's tricky, as then the parser definition doesn't specify which
>> tags have an end-tag pair, and what can be nested between them, only
>> the Dialect knows that. Like, if you look at the current parser, it
>> basically says that "if" is like
>>
>>   "<#" "if" Expression ">" MixedContent "</#" "if" ">"
>>
>> which is expressive and all, but sadly it won't be possible in FM3 to
>> do it like that.
>>
>>> * expressions: numbers, variables, +
>>> * variants of the parsers with different delimiters
>>> * split into two parsers (interpolations/directives vs. expression language)
>>>
>>> What do you think?
>>
>> I haven't used any parser library but JavaCC, so I have not tips
>> there. Otherwise the plan sounds good.
>>
>> Anyway, I kind of repeat myself here, but the expectations that may
>> filter down the candidates quickly:
>>
>> - Splitting into two parsers, of course
>>
>> - Maintainability of custom syntax variations (like new FreeMarker
>>   versions won't break them, or at least they need no manual work to
>>   regenerate them)
>>
>> - How parsing partially driven by the Dialect looks... it won't fit
>>   JavaCC well for example. (But, probably it won't be very nice with
>>   any of them.)
>>
>> In case multiple of the libraries stay alive, some further extras that
>> can decide:
>>
>> - More understandable/helpful error messages is a big plus.
>>
>> - It would be interesting to see how hard it is to write a parser that
>>   continues parsing after the first error, to catch more errors. This
>>   is mostly for IDE-s.
>>
>>> Stephan.
>>>
>>> P.S.: my more detailed list of parser generators can be found here: 
>>> https://gist.github.com/chaquotay/8041096bad36f6f3f0d4166d6f8623b5
>>
>

Re: Anybody interested in some FM3 parser research?

Posted by Daniel Dekany <dd...@apache.org>.

Did this activity lead to somewhere? ?For example, do you have a
recommendation for the lexer/parser library to use, if we only want to
handle the expression syntax with it (so starting a new lexing/parsing
should have low overhead)?


Tuesday, August 7, 2018, 12:59:14 AM, Daniel Dekany wrote:

> Sunday, August 5, 2018, 6:58:11 PM, Stephan Müller wrote:
>
>> Am 04.07.2018 um 19:28 schrieb Daniel Dekany:
>>> I wonder what parser libraries could help us, in FM3, to separate the
>>> expression language parsing from the top-level language (like
>>> `<#foo>`, `${...}`, etc.) parsing. Or if a hand written parsers is an
>>> acceptable compromise. It would be good if we can change the top-level
>>> syntax and still reuse the expression syntax. (Or, replace the
>>> expression syntax, and reuse the top-level one.) Like, somebody wants
>>> a syntax like `#foo(exp)` instead of `<#foo exp>`, but still reuse the
>>> expression syntax. (For me it was always part of the FM3 agenda,
>>> though might will be proven to be too much...)
>>> [..]
>>
>> During the last days I had a high-level look at different parser 
>> generators, and as one might imagine, there are a lot of parser 
>> generators, with different licenses, different maturities, different 
>> states of maintenance and so on.
>>
>> Due to https://www.apache.org/legal/resolved.html I ignored all parser
>> generators which may not be included in Apache projects because of their
>> license, especially GNU GPL etc.
>>
>> IMHO this leaves us with:
>>
>> * LL(k) parsers: ANTLR, JavaCC and Grammatica
>> * LALR parsers: CookCC
>> * PEG parsers: Mouse
>> * parser combinators: jparsec, parboiled and PetitParser
>>
>> This list is not exhaustive, so I probably forget some interesting 
>> projects. If so, please share, I'd like to have a look into these, too.
>>
>> My idea for the next step: define a really small subset of FTL and try
>> to implement PoCs for this subset with the candidates which I mentioned
>> above.
>>
>> The subset might be something like
>>
>> * interpolations: ${..}
>> * directives: if, assign
>
> Just to be on the safe side, I will note that you shouldn't try to
> hard-code parser logic that's specific to a directive (like "if").
> Instead, you should try to parse an unified/generic directive call
> syntax, and then invoke the Dialect to find out the further rules. And
> that's tricky, as then the parser definition doesn't specify which
> tags have an end-tag pair, and what can be nested between them, only
> the Dialect knows that. Like, if you look at the current parser, it
> basically says that "if" is like
>
>   "<#" "if" Expression ">" MixedContent "</#" "if" ">"
>
> which is expressive and all, but sadly it won't be possible in FM3 to
> do it like that.
>
>> * expressions: numbers, variables, +
>> * variants of the parsers with different delimiters
>> * split into two parsers (interpolations/directives vs. expression language)
>>
>> What do you think?
>
> I haven't used any parser library but JavaCC, so I have not tips
> there. Otherwise the plan sounds good.
>
> Anyway, I kind of repeat myself here, but the expectations that may
> filter down the candidates quickly:
>
> - Splitting into two parsers, of course
>
> - Maintainability of custom syntax variations (like new FreeMarker
>   versions won't break them, or at least they need no manual work to
>   regenerate them)
>
> - How parsing partially driven by the Dialect looks... it won't fit
>   JavaCC well for example. (But, probably it won't be very nice with
>   any of them.)
>
> In case multiple of the libraries stay alive, some further extras that
> can decide:
>
> - More understandable/helpful error messages is a big plus.
>
> - It would be interesting to see how hard it is to write a parser that
>   continues parsing after the first error, to catch more errors. This
>   is mostly for IDE-s.
>
>> Stephan.
>>
>> P.S.: my more detailed list of parser generators can be found here: 
>> https://gist.github.com/chaquotay/8041096bad36f6f3f0d4166d6f8623b5
>

-- 
Thanks,
 Daniel Dekany

Re: Anybody interested in some FM3 parser research?

Posted by Daniel Dekany <dd...@apache.org>.

Sunday, August 5, 2018, 6:58:11 PM, Stephan Müller wrote:

> Am 04.07.2018 um 19:28 schrieb Daniel Dekany:
>> I wonder what parser libraries could help us, in FM3, to separate the
>> expression language parsing from the top-level language (like
>> `<#foo>`, `${...}`, etc.) parsing. Or if a hand written parsers is an
>> acceptable compromise. It would be good if we can change the top-level
>> syntax and still reuse the expression syntax. (Or, replace the
>> expression syntax, and reuse the top-level one.) Like, somebody wants
>> a syntax like `#foo(exp)` instead of `<#foo exp>`, but still reuse the
>> expression syntax. (For me it was always part of the FM3 agenda,
>> though might will be proven to be too much...)
>> [..]
>
> During the last days I had a high-level look at different parser 
> generators, and as one might imagine, there are a lot of parser 
> generators, with different licenses, different maturities, different 
> states of maintenance and so on.
>
> Due to https://www.apache.org/legal/resolved.html I ignored all parser
> generators which may not be included in Apache projects because of their
> license, especially GNU GPL etc.
>
> IMHO this leaves us with:
>
> * LL(k) parsers: ANTLR, JavaCC and Grammatica
> * LALR parsers: CookCC
> * PEG parsers: Mouse
> * parser combinators: jparsec, parboiled and PetitParser
>
> This list is not exhaustive, so I probably forget some interesting 
> projects. If so, please share, I'd like to have a look into these, too.
>
> My idea for the next step: define a really small subset of FTL and try
> to implement PoCs for this subset with the candidates which I mentioned
> above.
>
> The subset might be something like
>
> * interpolations: ${..}
> * directives: if, assign

Just to be on the safe side, I will note that you shouldn't try to
hard-code parser logic that's specific to a directive (like "if").
Instead, you should try to parse an unified/generic directive call
syntax, and then invoke the Dialect to find out the further rules. And
that's tricky, as then the parser definition doesn't specify which
tags have an end-tag pair, and what can be nested between them, only
the Dialect knows that. Like, if you look at the current parser, it
basically says that "if" is like

  "<#" "if" Expression ">" MixedContent "</#" "if" ">"

which is expressive and all, but sadly it won't be possible in FM3 to
do it like that.

> * expressions: numbers, variables, +
> * variants of the parsers with different delimiters
> * split into two parsers (interpolations/directives vs. expression language)
>
> What do you think?

I haven't used any parser library but JavaCC, so I have not tips
there. Otherwise the plan sounds good.

Anyway, I kind of repeat myself here, but the expectations that may
filter down the candidates quickly:

- Splitting into two parsers, of course

- Maintainability of custom syntax variations (like new FreeMarker
  versions won't break them, or at least they need no manual work to
  regenerate them)

- How parsing partially driven by the Dialect looks... it won't fit
  JavaCC well for example. (But, probably it won't be very nice with
  any of them.)

In case multiple of the libraries stay alive, some further extras that
can decide:

- More understandable/helpful error messages is a big plus.

- It would be interesting to see how hard it is to write a parser that
  continues parsing after the first error, to catch more errors. This
  is mostly for IDE-s.

> Stephan.
>
> P.S.: my more detailed list of parser generators can be found here: 
> https://gist.github.com/chaquotay/8041096bad36f6f3f0d4166d6f8623b5

-- 
Thanks,
 Daniel Dekany

Re: Anybody interested in some FM3 parser research?

Posted by Angelo zerr <an...@gmail.com>.

Hi Daniel,

Thanks for your answer. When your work with parser will be available,
please ping me and I will try to consume it and give you feedback.

Regard's Angelo

Le sam. 25 août 2018 à 09:44, Daniel Dekany <dd...@apache.org> a écrit :

> Any progress? (I understand if not, with a newborn and all, I'm just
> curious.)
>
> Note my answer to Angelo. As I said there, maybe we should settle with
> only expression parsing done by the parser library, and the others are
> done "manually". While that removes most of the tricky requirements
> from the parser library, it brings in a new one, that doing many
> independent expression parsings should have minimal overhead.
>
>
> Sunday, August 5, 2018, 6:58:11 PM, Stephan Müller wrote:
>
> > Am 04.07.2018 um 19:28 schrieb Daniel Dekany:
> >> I wonder what parser libraries could help us, in FM3, to separate the
> >> expression language parsing from the top-level language (like
> >> `<#foo>`, `${...}`, etc.) parsing. Or if a hand written parsers is an
> >> acceptable compromise. It would be good if we can change the top-level
> >> syntax and still reuse the expression syntax. (Or, replace the
> >> expression syntax, and reuse the top-level one.) Like, somebody wants
> >> a syntax like `#foo(exp)` instead of `<#foo exp>`, but still reuse the
> >> expression syntax. (For me it was always part of the FM3 agenda,
> >> though might will be proven to be too much...)
> >> [..]
> >
> > During the last days I had a high-level look at different parser
> > generators, and as one might imagine, there are a lot of parser
> > generators, with different licenses, different maturities, different
> > states of maintenance and so on.
> >
> > Due to https://www.apache.org/legal/resolved.html I ignored all parser
> > generators which may not be included in Apache projects because of their
> > license, especially GNU GPL etc.
> >
> > IMHO this leaves us with:
> >
> > * LL(k) parsers: ANTLR, JavaCC and Grammatica
> > * LALR parsers: CookCC
> > * PEG parsers: Mouse
> > * parser combinators: jparsec, parboiled and PetitParser
> >
> > This list is not exhaustive, so I probably forget some interesting
> > projects. If so, please share, I'd like to have a look into these, too.
> >
> > My idea for the next step: define a really small subset of FTL and try
> > to implement PoCs for this subset with the candidates which I mentioned
> > above.
> >
> > The subset might be something like
> >
> > * interpolations: ${..}
> > * directives: if, assign
> > * expressions: numbers, variables, +
> > * variants of the parsers with different delimiters
> > * split into two parsers (interpolations/directives vs. expression
> language)
> >
> > What do you think?
> >
> >
> > Stephan.
> >
> > P.S.: my more detailed list of parser generators can be found here:
> > https://gist.github.com/chaquotay/8041096bad36f6f3f0d4166d6f8623b5
> >
>
> --
> Thanks,
>  Daniel Dekany
>
>

Re: Anybody interested in some FM3 parser research?

Posted by Daniel Dekany <dd...@apache.org>.

Monday, August 27, 2018, 10:22:31 AM, Stephan Müller wrote:

>
> Am 25.08.2018 um 09:44 schrieb Daniel Dekany:
>> Any progress? (I understand if not, with a newborn and all, I'm just
>> curious.)
>
> Not a lot of progress, but at least I began looking into the first
> library last weekend. I'll push my experimental code to github as soon
> as it makes sense, so that everyone can have a look at it.

Great, looking forward to see the outcome!

>> Note my answer to Angelo. As I said there, maybe we should settle with
>> only expression parsing done by the parser library, and the others are
>> done "manually". While that removes most of the tricky requirements
>> from the parser library, it brings in a new one, that doing many
>> independent expression parsings should have minimal overhead.
>
> Yep, I noticed that part. I've done some manual parsing in the past
> (simple top-down LL(k)), but in my limited experience things like good
> error reporting and error recovery might be a litte bit easier when
> using a parser library which already has decent support for these kind
> of things. But it's definitely something to keep in mind as a "plan b".

I see this "plan b" as a quite likely outcome, so I won't be shocked.
But we will see anyway.

Some musings...

Our current JavaCC solution has quite unhelpful error reporting when
it comes to the JavaCC-generated part of the error messages. Basically
it just dumps the tokens it has expected at the user, which I think is
rarely helpful for the average user, and is scary... That's maybe
nearly as much as can be automatically generated, but my point is that
we could get by without that feature, which is good, because that part
is impractical to implement in a hand written parser that I can
imagine. Surely JavaCC also tracks the positions for you, which is
obviously needed for useful error messages, but I belive that part is
not a that big deal to do manually.

Another thing that's the current JavaCC parser does poorly is
reporting tag pairing problems. We tried to hack some error post
processing on top of it, which tells which tag you have forgotten, but
it's ugly, and doesn't work reliably. But as I said much earlier, this
type of problem raises to a new level with FM3's dialects feature.
Doable even in JavaCC, but on the cost of losing some of the elegance
why you use a parser generator on the first place.

BTW, another variation is when you use a lexer generator, but not a
parser generator...

> Stephan.
>
>> Sunday, August 5, 2018, 6:58:11 PM, Stephan Müller wrote:
>> 
>>> Am 04.07.2018 um 19:28 schrieb Daniel Dekany:
>>>> I wonder what parser libraries could help us, in FM3, to separate the
>>>> expression language parsing from the top-level language (like
>>>> `<#foo>`, `${...}`, etc.) parsing. Or if a hand written parsers is an
>>>> acceptable compromise. It would be good if we can change the top-level
>>>> syntax and still reuse the expression syntax. (Or, replace the
>>>> expression syntax, and reuse the top-level one.) Like, somebody wants
>>>> a syntax like `#foo(exp)` instead of `<#foo exp>`, but still reuse the
>>>> expression syntax. (For me it was always part of the FM3 agenda,
>>>> though might will be proven to be too much...)
>>>> [..]
>>>
>>> During the last days I had a high-level look at different parser 
>>> generators, and as one might imagine, there are a lot of parser 
>>> generators, with different licenses, different maturities, different 
>>> states of maintenance and so on.
>>>
>>> Due to https://www.apache.org/legal/resolved.html I ignored all parser
>>> generators which may not be included in Apache projects because of their
>>> license, especially GNU GPL etc.
>>>
>>> IMHO this leaves us with:
>>>
>>> * LL(k) parsers: ANTLR, JavaCC and Grammatica
>>> * LALR parsers: CookCC
>>> * PEG parsers: Mouse
>>> * parser combinators: jparsec, parboiled and PetitParser
>>>
>>> This list is not exhaustive, so I probably forget some interesting 
>>> projects. If so, please share, I'd like to have a look into these, too.
>>>
>>> My idea for the next step: define a really small subset of FTL and try
>>> to implement PoCs for this subset with the candidates which I mentioned
>>> above.
>>>
>>> The subset might be something like
>>>
>>> * interpolations: ${..}
>>> * directives: if, assign
>>> * expressions: numbers, variables, +
>>> * variants of the parsers with different delimiters
>>> * split into two parsers (interpolations/directives vs. expression language)
>>>
>>> What do you think?
>>>
>>>
>>> Stephan.
>>>
>>> P.S.: my more detailed list of parser generators can be found here: 
>>> https://gist.github.com/chaquotay/8041096bad36f6f3f0d4166d6f8623b5
>>>
>> 
>
>

-- 
Thanks,
 Daniel Dekany

Re: Anybody interested in some FM3 parser research?

Posted by Stephan Müller <st...@notatoaster.org>.

Am 25.08.2018 um 09:44 schrieb Daniel Dekany:
> Any progress? (I understand if not, with a newborn and all, I'm just
> curious.)

Not a lot of progress, but at least I began looking into the first
library last weekend. I'll push my experimental code to github as soon
as it makes sense, so that everyone can have a look at it.

> Note my answer to Angelo. As I said there, maybe we should settle with
> only expression parsing done by the parser library, and the others are
> done "manually". While that removes most of the tricky requirements
> from the parser library, it brings in a new one, that doing many
> independent expression parsings should have minimal overhead.

Yep, I noticed that part. I've done some manual parsing in the past
(simple top-down LL(k)), but in my limited experience things like good
error reporting and error recovery might be a litte bit easier when
using a parser library which already has decent support for these kind
of things. But it's definitely something to keep in mind as a "plan b".


Stephan.

> Sunday, August 5, 2018, 6:58:11 PM, Stephan Müller wrote:
> 
>> Am 04.07.2018 um 19:28 schrieb Daniel Dekany:
>>> I wonder what parser libraries could help us, in FM3, to separate the
>>> expression language parsing from the top-level language (like
>>> `<#foo>`, `${...}`, etc.) parsing. Or if a hand written parsers is an
>>> acceptable compromise. It would be good if we can change the top-level
>>> syntax and still reuse the expression syntax. (Or, replace the
>>> expression syntax, and reuse the top-level one.) Like, somebody wants
>>> a syntax like `#foo(exp)` instead of `<#foo exp>`, but still reuse the
>>> expression syntax. (For me it was always part of the FM3 agenda,
>>> though might will be proven to be too much...)
>>> [..]
>>
>> During the last days I had a high-level look at different parser 
>> generators, and as one might imagine, there are a lot of parser 
>> generators, with different licenses, different maturities, different 
>> states of maintenance and so on.
>>
>> Due to https://www.apache.org/legal/resolved.html I ignored all parser
>> generators which may not be included in Apache projects because of their
>> license, especially GNU GPL etc.
>>
>> IMHO this leaves us with:
>>
>> * LL(k) parsers: ANTLR, JavaCC and Grammatica
>> * LALR parsers: CookCC
>> * PEG parsers: Mouse
>> * parser combinators: jparsec, parboiled and PetitParser
>>
>> This list is not exhaustive, so I probably forget some interesting 
>> projects. If so, please share, I'd like to have a look into these, too.
>>
>> My idea for the next step: define a really small subset of FTL and try
>> to implement PoCs for this subset with the candidates which I mentioned
>> above.
>>
>> The subset might be something like
>>
>> * interpolations: ${..}
>> * directives: if, assign
>> * expressions: numbers, variables, +
>> * variants of the parsers with different delimiters
>> * split into two parsers (interpolations/directives vs. expression language)
>>
>> What do you think?
>>
>>
>> Stephan.
>>
>> P.S.: my more detailed list of parser generators can be found here: 
>> https://gist.github.com/chaquotay/8041096bad36f6f3f0d4166d6f8623b5
>>
>

Re: Anybody interested in some FM3 parser research?

Posted by Daniel Dekany <dd...@apache.org>.

Any progress? (I understand if not, with a newborn and all, I'm just
curious.)

Note my answer to Angelo. As I said there, maybe we should settle with
only expression parsing done by the parser library, and the others are
done "manually". While that removes most of the tricky requirements
from the parser library, it brings in a new one, that doing many
independent expression parsings should have minimal overhead.


Sunday, August 5, 2018, 6:58:11 PM, Stephan Müller wrote:

> Am 04.07.2018 um 19:28 schrieb Daniel Dekany:
>> I wonder what parser libraries could help us, in FM3, to separate the
>> expression language parsing from the top-level language (like
>> `<#foo>`, `${...}`, etc.) parsing. Or if a hand written parsers is an
>> acceptable compromise. It would be good if we can change the top-level
>> syntax and still reuse the expression syntax. (Or, replace the
>> expression syntax, and reuse the top-level one.) Like, somebody wants
>> a syntax like `#foo(exp)` instead of `<#foo exp>`, but still reuse the
>> expression syntax. (For me it was always part of the FM3 agenda,
>> though might will be proven to be too much...)
>> [..]
>
> During the last days I had a high-level look at different parser 
> generators, and as one might imagine, there are a lot of parser 
> generators, with different licenses, different maturities, different 
> states of maintenance and so on.
>
> Due to https://www.apache.org/legal/resolved.html I ignored all parser
> generators which may not be included in Apache projects because of their
> license, especially GNU GPL etc.
>
> IMHO this leaves us with:
>
> * LL(k) parsers: ANTLR, JavaCC and Grammatica
> * LALR parsers: CookCC
> * PEG parsers: Mouse
> * parser combinators: jparsec, parboiled and PetitParser
>
> This list is not exhaustive, so I probably forget some interesting 
> projects. If so, please share, I'd like to have a look into these, too.
>
> My idea for the next step: define a really small subset of FTL and try
> to implement PoCs for this subset with the candidates which I mentioned
> above.
>
> The subset might be something like
>
> * interpolations: ${..}
> * directives: if, assign
> * expressions: numbers, variables, +
> * variants of the parsers with different delimiters
> * split into two parsers (interpolations/directives vs. expression language)
>
> What do you think?
>
>
> Stephan.
>
> P.S.: my more detailed list of parser generators can be found here: 
> https://gist.github.com/chaquotay/8041096bad36f6f3f0d4166d6f8623b5
>

-- 
Thanks,
 Daniel Dekany