You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Stepan Mishura <st...@gmail.com> on 2006/08/23 12:10:55 UTC

Re: [classlib][html] Please evaluate proposed ASN.1 notation for HTML DTD

Hi Miguel,

I've looked thought proposed ASN.1 notation and it looks OK for me. I have
only few comments.
(However I don't know all details of DTD, i.e. I've not checked whether your
notation correctly represents DTD so I'll comment only proposed ASN.1notation.)

BTW, I've changed the subject if you don't mind.

Common remark: a component of SEQUENCE(OF), SET(OF) should starts with a
lower-case letter.

Other comment see below.

On 8/23/06, Miguel Montes wrote:

> Hi:
> We are working on the html parser, and need to have working DTD. The
> current
> implementation of DTD.read(), based on serialization, has some problems,
> and
> I think we should have a well defined binary format. I suggest the
> following
> ASN.1 format, and if there is consensus on it, we could contribute the
> code
> to read and write it.
> I would like to hear the opinion of Stepan and anyone who has worked with
> ASN.1 before.
>
> BDTD ::= SEQUENCE {
>       Name UTF8String,
>       Entity SET OF HTMLEntity,
>       Element SET OF HTMLElement
> }
>
> HTMLEntity ::= SEQUENCE {
>       Name UTF8String,
>       Value INTEGER,
>       General BOOLEAN DEFAULT FALSE,
>       Parameter BOOLEAN DEFAULT FALSE,
>       Data UTF8String
> }


This won't work. I'll try to explain. We have 2 DEFAULT components here. If
a component is declared as DEFAULT then it is also OPTIONAL and can be
missed. A decoder can detect which component is missed only if a in block of
OPTIONAL components plus next mandatory component all elements are distinct.

We have the next block:
general         BOOLEAN    DEFAULT FALSE
parameter    BOOLEAN    DEFAULT FALSE
data              UTF8String

So 1-st and 2-nd elements are not distinct. This can be fixed by tagging
some elements. I'd use implicit tagging, for example:

 general                                    BOOLEAN    DEFAULT FALSE
parameter    [0]  IMPLICIT     BOOLEAN    DEFAULT FALSE

or

 general         [0]  IMPLICIT     BOOLEAN    DEFAULT FALSE
parameter    [1]  IMPLICIT     BOOLEAN    DEFAULT FALSE

Thanks,
Stepan.

P.S. I'll let you know if I have more corrections.





> HTMLElement ::= SEQUENCE {
>       Index INTEGER,
>       Name UTF8String,
>       Type INTEGER,
>       OStart BOOLEAN,
>       OEnd BOOLEAN,
>       Exclusions SET OF INTEGER,
>       Inclusions SET OF INTEGER,
>       Attributes SET OF HTMLElementAttributes OPTIONAL,
>       ContentModel HTMLContentModel,
> }
>
> HTMLContentModel ::= SEQUENCE OF SEQUENCE {
>       Type INTEGER,
>       Index INTEGER
> }
>
> HTMLElementAttributes ::= SEQUENCE {
>       Name UTF8String,
>       Type INTEGER,
>       Modifier INTEGER,
>       DefaultValue UTF8String OPTIONAL,
>       PossibleValues SET OF UTF8String OPTIONAL
> }
> --
> Miguel Montes
>
>

------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [classlib][html] Please evaluate proposed ASN.1 notation for HTML DTD

Posted by Stepan Mishura <st...@gmail.com>.
On 8/24/06, Miguel Montes wrote:
>
> Thanks Stepan. So, it should be
>
> BDTD ::= SEQUENCE {
>        name UTF8String,
>        entity SET OF HTMLEntity,
>        element SET OF HTMLElement
> }
>
> HTMLEntity ::= SEQUENCE {
>        name UTF8String,
>        value INTEGER,
>        general [0] IMPLICIT BOOLEAN DEFAULT FALSE,
>        parameter [1] IMPLICIT BOOLEAN DEFAULT FALSE,
>        data UTF8String
> }
>
> HTMLElement ::= SEQUENCE {
>        index INTEGER,
>        name UTF8String,
>        type INTEGER,
>        oStart BOOLEAN,
>        oEnd BOOLEAN,
>        exclusions SET OF INTEGER,
>        inclusions SET OF INTEGER,
>        attributes SET OF HTMLElementAttributes OPTIONAL,
>        contentModel HTMLContentModel
> }
>
> HTMLContentModel ::= SEQUENCE OF SEQUENCE {
>        type INTEGER,
>        index INTEGER
> }
>
> HTMLElementAttributes ::= SEQUENCE {
>        name UTF8String,
>        type INTEGER,
>        modifier INTEGER,
>        defaultValue UTF8String OPTIONAL,
>        possibleValues SET OF UTF8String OPTIONAL
> }
>
>
> If we want exclusions and inclusions in HTMLElement to be optional, it
> should be something like
>
> HTMLElement ::= SEQUENCE {
>        index INTEGER,
>        name UTF8String,
>        type INTEGER,
>        oStart BOOLEAN,
>        oEnd BOOLEAN,
>        exclusions [0] IMPLICIT SET OF INTEGER OPTIONAL,
>        inclusions [1] IMPLICIT SET OF INTEGER OPTIONAL,
>        attributes SET OF HTMLElementAttributes OPTIONAL,
>        contentModel HTMLContentModel
> }
>
> Is this right?


Yes, that is right.

- Stepan.

On 8/23/06, Stepan Mishura wrote:
> >
> > Hi Miguel,
> >
> > I've looked thought proposed ASN.1 notation and it looks OK for me. I
> have
> > only few comments.
> > (However I don't know all details of DTD, i.e. I've not checked whether
> > your
> > notation correctly represents DTD so I'll comment only proposed
> > ASN.1notation.)
> >
> > BTW, I've changed the subject if you don't mind.
> >
> > Common remark: a component of SEQUENCE(OF), SET(OF) should starts with a
> > lower-case letter.
> >
> > Other comment see below.
> >
> > On 8/23/06, Miguel Montes wrote:
> >
> > > Hi:
> > > We are working on the html parser, and need to have working DTD. The
> > > current
> > > implementation of DTD.read(), based on serialization, has some
> problems,
> > > and
> > > I think we should have a well defined binary format. I suggest the
> > > following
> > > ASN.1 format, and if there is consensus on it, we could contribute the
> > > code
> > > to read and write it.
> > > I would like to hear the opinion of Stepan and anyone who has worked
> > with
> > > ASN.1 before.
> > >
> > > BDTD ::= SEQUENCE {
> > >       Name UTF8String,
> > >       Entity SET OF HTMLEntity,
> > >       Element SET OF HTMLElement
> > > }
> > >
> > > HTMLEntity ::= SEQUENCE {
> > >       Name UTF8String,
> > >       Value INTEGER,
> > >       General BOOLEAN DEFAULT FALSE,
> > >       Parameter BOOLEAN DEFAULT FALSE,
> > >       Data UTF8String
> > > }
> >
> >
> > This won't work. I'll try to explain. We have 2 DEFAULT components here.
> > If
> > a component is declared as DEFAULT then it is also OPTIONAL and can be
> > missed. A decoder can detect which component is missed only if a in
> block
> > of
> > OPTIONAL components plus next mandatory component all elements are
> > distinct.
> >
> > We have the next block:
> > general         BOOLEAN    DEFAULT FALSE
> > parameter    BOOLEAN    DEFAULT FALSE
> > data              UTF8String
> >
> > So 1-st and 2-nd elements are not distinct. This can be fixed by tagging
> > some elements. I'd use implicit tagging, for example:
> >
> > general                                    BOOLEAN    DEFAULT FALSE
> > parameter    [0]  IMPLICIT     BOOLEAN    DEFAULT FALSE
> >
> > or
> >
> > general         [0]  IMPLICIT     BOOLEAN    DEFAULT FALSE
> > parameter    [1]  IMPLICIT     BOOLEAN    DEFAULT FALSE
> >
> > Thanks,
> > Stepan.
> >
> > P.S. I'll let you know if I have more corrections.
> >
> >
> >
> >
> >
> > > HTMLElement ::= SEQUENCE {
> > >       Index INTEGER,
> > >       Name UTF8String,
> > >       Type INTEGER,
> > >       OStart BOOLEAN,
> > >       OEnd BOOLEAN,
> > >       Exclusions SET OF INTEGER,
> > >       Inclusions SET OF INTEGER,
> > >       Attributes SET OF HTMLElementAttributes OPTIONAL,
> > >       ContentModel HTMLContentModel,
> > > }
> > >
> > > HTMLContentModel ::= SEQUENCE OF SEQUENCE {
> > >       Type INTEGER,
> > >       Index INTEGER
> > > }
> > >
> > > HTMLElementAttributes ::= SEQUENCE {
> > >       Name UTF8String,
> > >       Type INTEGER,
> > >       Modifier INTEGER,
> > >       DefaultValue UTF8String OPTIONAL,
> > >       PossibleValues SET OF UTF8String OPTIONAL
> > > }
> > > --
> > > Miguel Montes
> > >
>
>
------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [classlib][html] Please evaluate proposed ASN.1 notation for HTML DTD

Posted by Miguel Montes <mi...@gmail.com>.
Thanks Stepan. So, it should be

BDTD ::= SEQUENCE {
	name UTF8String,	
	entity SET OF HTMLEntity,
	element SET OF HTMLElement
}

HTMLEntity ::= SEQUENCE {
	name UTF8String,
	value INTEGER,
	general [0] IMPLICIT BOOLEAN DEFAULT FALSE,
	parameter [1] IMPLICIT BOOLEAN DEFAULT FALSE,
	data UTF8String
}

HTMLElement ::= SEQUENCE {
	index INTEGER,
	name UTF8String,
	type INTEGER,
	oStart BOOLEAN,
	oEnd BOOLEAN,
	exclusions SET OF INTEGER,
	inclusions SET OF INTEGER,
	attributes SET OF HTMLElementAttributes OPTIONAL,
	contentModel HTMLContentModel
}

HTMLContentModel ::= SEQUENCE OF SEQUENCE {
	type INTEGER,
	index INTEGER
}

HTMLElementAttributes ::= SEQUENCE {
	name UTF8String,
	type INTEGER,
	modifier INTEGER,
	defaultValue UTF8String OPTIONAL,
	possibleValues SET OF UTF8String OPTIONAL
}


If we want exclusions and inclusions in HTMLElement to be optional, it
should be something like

HTMLElement ::= SEQUENCE {
	index INTEGER,
	name UTF8String,
	type INTEGER,
	oStart BOOLEAN,
	oEnd BOOLEAN,
	exclusions [0] IMPLICIT SET OF INTEGER OPTIONAL,
	inclusions [1] IMPLICIT SET OF INTEGER OPTIONAL,
	attributes SET OF HTMLElementAttributes OPTIONAL,
	contentModel HTMLContentModel
}

Is this right?


On 8/23/06, Stepan Mishura <st...@gmail.com> wrote:
>
> Hi Miguel,
>
> I've looked thought proposed ASN.1 notation and it looks OK for me. I have
> only few comments.
> (However I don't know all details of DTD, i.e. I've not checked whether
> your
> notation correctly represents DTD so I'll comment only proposed
> ASN.1notation.)
>
> BTW, I've changed the subject if you don't mind.
>
> Common remark: a component of SEQUENCE(OF), SET(OF) should starts with a
> lower-case letter.
>
> Other comment see below.
>
> On 8/23/06, Miguel Montes wrote:
>
> > Hi:
> > We are working on the html parser, and need to have working DTD. The
> > current
> > implementation of DTD.read(), based on serialization, has some problems,
> > and
> > I think we should have a well defined binary format. I suggest the
> > following
> > ASN.1 format, and if there is consensus on it, we could contribute the
> > code
> > to read and write it.
> > I would like to hear the opinion of Stepan and anyone who has worked
> with
> > ASN.1 before.
> >
> > BDTD ::= SEQUENCE {
> >       Name UTF8String,
> >       Entity SET OF HTMLEntity,
> >       Element SET OF HTMLElement
> > }
> >
> > HTMLEntity ::= SEQUENCE {
> >       Name UTF8String,
> >       Value INTEGER,
> >       General BOOLEAN DEFAULT FALSE,
> >       Parameter BOOLEAN DEFAULT FALSE,
> >       Data UTF8String
> > }
>
>
> This won't work. I'll try to explain. We have 2 DEFAULT components here.
> If
> a component is declared as DEFAULT then it is also OPTIONAL and can be
> missed. A decoder can detect which component is missed only if a in block
> of
> OPTIONAL components plus next mandatory component all elements are
> distinct.
>
> We have the next block:
> general         BOOLEAN    DEFAULT FALSE
> parameter    BOOLEAN    DEFAULT FALSE
> data              UTF8String
>
> So 1-st and 2-nd elements are not distinct. This can be fixed by tagging
> some elements. I'd use implicit tagging, for example:
>
> general                                    BOOLEAN    DEFAULT FALSE
> parameter    [0]  IMPLICIT     BOOLEAN    DEFAULT FALSE
>
> or
>
> general         [0]  IMPLICIT     BOOLEAN    DEFAULT FALSE
> parameter    [1]  IMPLICIT     BOOLEAN    DEFAULT FALSE
>
> Thanks,
> Stepan.
>
> P.S. I'll let you know if I have more corrections.
>
>
>
>
>
> > HTMLElement ::= SEQUENCE {
> >       Index INTEGER,
> >       Name UTF8String,
> >       Type INTEGER,
> >       OStart BOOLEAN,
> >       OEnd BOOLEAN,
> >       Exclusions SET OF INTEGER,
> >       Inclusions SET OF INTEGER,
> >       Attributes SET OF HTMLElementAttributes OPTIONAL,
> >       ContentModel HTMLContentModel,
> > }
> >
> > HTMLContentModel ::= SEQUENCE OF SEQUENCE {
> >       Type INTEGER,
> >       Index INTEGER
> > }
> >
> > HTMLElementAttributes ::= SEQUENCE {
> >       Name UTF8String,
> >       Type INTEGER,
> >       Modifier INTEGER,
> >       DefaultValue UTF8String OPTIONAL,
> >       PossibleValues SET OF UTF8String OPTIONAL
> > }
> > --
> > Miguel Montes
> >
> >
>
> ------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Miguel Montes