You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@directory.apache.org by Emmanuel Lecharny <el...@iktek.com> on 2005/02/04 20:31:18 UTC

Antlr or simple code for parsing DN ?

Hi,

just a question : is it really necessary to use a tool like antlr to
parse a grammar that can be parsed "by hand" ?

I mean, there are many problems to parse this kind of grammar (DN) with
antlr due to the LL(k) lexer. LL(k) grammars are GOOD, but it leads to
many tricky adjustment if applied to a lexer.

As seen with Alan, a DN string may be very harsh to parse, and must be
"normalized" to allow comparisons. But normalization should occur at the
lexer level - correct me if I'm wrong -, not at the parser level, if the
normalization process only means "lowercasing" hex values and
"octothorping" of oid values (sorry for those horrible neologisms )

Thus, it costs time, and energy to fix this grammar and let it accepts
any kind of valid AND invalid strings. 

At which point do we have to consider that it costs more than having a
Java parser hand written? (including maintenance costs, of course).

For instance, g files used to parse DNs are 400 lines long actually. My
own private DN parser which I wrote 2 weeks ago - way before I knew that
Apache DS was given half-life - is 700 lines long.

What do you think ?

(of course, my -almost- total ignorance of antlr suggest me to write
message ;-)

Cheers,
Emmanuel

Re: Antlr or simple code for parsing DN ?

Posted by Alex Karasulu <ao...@bellsouth.net>.

Emmanuel Lecharny wrote:

>Hi,
>
>just a question : is it really necessary to use a tool like antlr to
>parse a grammar that can be parsed "by hand" ?
>
>I mean, there are many problems to parse this kind of grammar (DN) with
>antlr due to the LL(k) lexer. LL(k) grammars are GOOD, but it leads to
>many tricky adjustment if applied to a lexer.
>
>As seen with Alan, a DN string may be very harsh to parse, and must be
>"normalized" to allow comparisons. But normalization should occur at the
>lexer level - correct me if I'm wrong -, not at the parser level, if the
>normalization process only means "lowercasing" hex values and
>"octothorping" of oid values (sorry for those horrible neologisms )
>
>Thus, it costs time, and energy to fix this grammar and let it accepts
>any kind of valid AND invalid strings. 
>
>At which point do we have to consider that it costs more than having a
>Java parser hand written? (including maintenance costs, of course).
>
>For instance, g files used to parse DNs are 400 lines long actually. My
>own private DN parser which I wrote 2 weeks ago - way before I knew that
>Apache DS was given half-life - is 700 lines long.
>
>What do you think ?
>  
>
You may be right.  It does not hurt to try we can leave the parser 
grammars where they are in case you change your mind.  Then we can swap 
one out for the other under the hood.  If you look at the DnParser 
implementation which just wraps the antlr generated parser you'll see 
this is nothing.  Also if you write something faster and easier to 
maintain then I'm all for it.

I want to start profiling stuff too.  Gotta find a nice tool for this.

Cheers,
Alex