You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@asterixdb.apache.org by Ian Maxon <im...@uci.edu> on 2016/06/08 03:17:13 UTC

The "real" ADM format

Hi all,
After my experience with having to fix a rather large ADM file dump from a
query to make it load back into the system I was compelled to try my hand
at making that not happen again. The first thing I tried my hand at was
basically what I did to make the file loadable but inside the type
printers; just remove all of the 'i32' and so on suffixes, as well as
making decimals not formatted in scientific notation. This is pretty easy
to do as well, not a huge change code-wise (but obviously I'll have to fix
all of the tests).

This got me to think though, which is the format that we actually want? The
current format that is output, or the format that we accept in the loader?
Since this is actually perhaps a language level change either way I figured
I should find consensus before spending more time on it.

Thoughts/comments are appreciated.

Thanks,
- Ian

Re: The "real" ADM format

Posted by Ian Maxon <im...@uci.edu>.

I think the int suffixes can be made to work, however there is sort of an
issue with the suffixes for floats or doubles. First, the existing grammar
doesn't deal with it at all for doubles, only floats. Second, "NaN" and
"Infinity" are valid values for a double, but making those work with the
suffix doesn't seem trivial to me.

On Wed, Jun 15, 2016 at 3:52 PM, Ian Maxon <im...@uci.edu> wrote:

> I've been looking at this a bit more, it turns out adm.grammar in
> asterix-external-data is the "real" ADM format. It is suppose to
> always accept suffixes of i8/16/32/etc after a digit sequence, but
> something must be wrong with how the grammar is being translated. It
> also appears that in some circumstances the parser can be coaxed into
> taking the output. Therefore it seems to me at this time that the real
> deficiency is in lexer-generator-maven-plugin and not elsewhere.
>
> On 6/8/16, Ian Maxon <im...@uci.edu> wrote:
> > I guess I don't view the round-trippability in the same way then, all it
> > means to me is that I can scan/output the data, load it, and end up with
> > the same thing, not necessarily that I can load it without specifying the
> > types and get them anyway because they're inlined to the data. I think if
> > we want that the better thing to do would be to do something like
> mysqldump
> > (e.g. it dumps the metadata/types as an equivalent query basically).
> Also,
> > if we changed the format to conflict with the existing output of
> SocialGen
> > we'd have issues with current experiments and reproducing old results.
> >
> > On Wed, Jun 8, 2016 at 1:17 PM, Chris Hillery <ch...@hillery.land>
> > wrote:
> >
> >> I think the answer there is "round-tripability", right? ADM is meant to
> >> exactly describe the data so that it can be reloaded in the same way it
> >> was. Someone correct me if that isn't a requirement of the format...
> >>
> >> Ceej
> >> On Jun 8, 2016 9:14 AM, "Ian Maxon" <im...@uci.edu> wrote:
> >>
> >> > Why should the type be intermingled with the data though when it isn't
> >> > strictly necessary? For example why do I care if someone used an int64
> >> > to
> >> > wrap something I know is actually a short integer, and so on. It also
> >> kind
> >> > of gets rid of the idea of ADM being a superset of JSON.
> >> >
> >> > On Tue, Jun 7, 2016 at 10:49 PM, Preston Carman <pr...@apache.org>
> >> > wrote:
> >> >
> >> > > The interval type format has been finalized and is the same for AQL
> >> > > and ADM. Below is an example of the format:
> >> > >
> >> > > interval(date("01-01-2011"), date("02-02-2012"))
> >> > >
> >> > > The interval constructor now uses other data type constructors to
> >> > > recreate an interval. The type of interval is defined by the two
> >> > > matching arguments.
> >> > >
> >> > >
> >> > > On Tue, Jun 7, 2016 at 9:36 PM, Chris Hillery <chillery@hillery.land
> >
> >> > > wrote:
> >> > > > Ah, the other thing I forgot to mention is that I didn't include
> >> > interval
> >> > > > types, because I'm not sure about their current status. There was
> >> some
> >> > > > discussion on the list in January (subject "Round Tripping ADM
> >> Interval
> >> > > > Data") but I'm not sure where it ended up as far as the form of
> the
> >> > > > constructors, and whether that was AQL or ADM or both.
> >> > > >
> >> > > > Ceej
> >> > > > aka Chris Hillery
> >> > > >
> >> > > > On Tue, Jun 7, 2016 at 9:34 PM, Chris Hillery
> >> > > > <chillery@hillery.land
> >> >
> >> > > wrote:
> >> > > >
> >> > > >> I started to create the current inventory of types, with the
> forms
> >> > > >> accepted / produced by the ADM parser, AQL parser, and ADM
> >> > > serialization.
> >> > > >> (I think we all agree that ADM parser and ADM serializer should
> be
> >> > 100%
> >> > > >> compatible.) Here it is:
> >> > > >>
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> https://docs.google.com/spreadsheets/d/1-11a9ETV1Bdh_bUm9_CszY4hEGJGbEBaVKUWrzeS-As/edit?usp=sharing
> >> > > >>
> >> > > >> I know this is not comprehensive (for instance, I'm pretty sure
> >> that a
> >> > > >> naked integer will be parsed by both ADM and AQL as an int64, so
> >> that
> >> > > form
> >> > > >> should be listed as an alternative) and I haven't verified that
> >> > > >> the
> >> > AQL
> >> > > >> parser forms in particular are accurate, but I think it's close.
> >> I've
> >> > > set
> >> > > >> it so anyone can edit that document, so please fill in the gaps
> if
> >> you
> >> > > know
> >> > > >> of any.
> >> > > >>
> >> > > >> We should also fill in the exact accepted forms for the various
> >> > derived
> >> > > >> types like the datetime, spatial, hex, and UUID types - eg., the
> >> valid
> >> > > >> forms of the double-quoted string in the duration() constructor
> is
> >> as
> >> > > >> specified by XML schema, and so on.
> >> > > >>
> >> > > >> Ceej
> >> > > >> aka Chris Hillery
> >> > > >>
> >> > > >> On Tue, Jun 7, 2016 at 8:53 PM, Chris Hillery
> >> > > >> <chillery@hillery.land
> >> >
> >> > > >> wrote:
> >> > > >>
> >> > > >>> If it's possible, I think it would be least confusing if the
> >> > serialized
> >> > > >>> ADM format was identical to the corresponding data constructors
> >> > > >>> in
> >> > > AQL. It
> >> > > >>> should be a goal IMHO that you can cut-and-paste an ADM file
> into
> >> the
> >> > > query
> >> > > >>> box in the web UI and the result would be the same as loading
> the
> >> > .adm.
> >> > > >>>
> >> > > >>> For more specifics, I think we need to write out for each data
> >> > > >>> type
> >> > > what
> >> > > >>> the current ADM and AQL formats are, and then pick a final
> answer
> >> for
> >> > > the
> >> > > >>> type (which may possibly be different from either of the current
> >> > forms,
> >> > > >>> although I suspect not). That will he the spec, and we can
> update
> >> the
> >> > > two
> >> > > >>> parsers (and all the test cases) accordingly.
> >> > > >>>
> >> > > >>> I started an email thread sometime last year about something
> >> > similar; I
> >> > > >>> think it was about JSON serialization, but it at least had the
> >> > > >>> AQL
> >> > > side of
> >> > > >>> this story for all simple types, I believe.
> >> > > >>>
> >> > > >>> Ceej
> >> > > >>> aka Chris Hillery
> >> > > >>> On Jun 7, 2016 8:17 PM, "Ian Maxon" <im...@uci.edu> wrote:
> >> > > >>>
> >> > > >>>> Hi all,
> >> > > >>>> After my experience with having to fix a rather large ADM file
> >> dump
> >> > > from
> >> > > >>>> a
> >> > > >>>> query to make it load back into the system I was compelled to
> >> > > >>>> try
> >> my
> >> > > hand
> >> > > >>>> at making that not happen again. The first thing I tried my
> hand
> >> at
> >> > > was
> >> > > >>>> basically what I did to make the file loadable but inside the
> >> > > >>>> type
> >> > > >>>> printers; just remove all of the 'i32' and so on suffixes, as
> >> > > >>>> well
> >> > as
> >> > > >>>> making decimals not formatted in scientific notation. This is
> >> pretty
> >> > > easy
> >> > > >>>> to do as well, not a huge change code-wise (but obviously I'll
> >> have
> >> > to
> >> > > >>>> fix
> >> > > >>>> all of the tests).
> >> > > >>>>
> >> > > >>>> This got me to think though, which is the format that we
> >> > > >>>> actually
> >> > > want?
> >> > > >>>> The
> >> > > >>>> current format that is output, or the format that we accept in
> >> > > >>>> the
> >> > > >>>> loader?
> >> > > >>>> Since this is actually perhaps a language level change either
> >> > > >>>> way
> >> I
> >> > > >>>> figured
> >> > > >>>> I should find consensus before spending more time on it.
> >> > > >>>>
> >> > > >>>> Thoughts/comments are appreciated.
> >> > > >>>>
> >> > > >>>> Thanks,
> >> > > >>>> - Ian
> >> > > >>>>
> >> > > >>>
> >> > > >>
> >> > >
> >> >
> >>
> >
>

Re: The "real" ADM format

Posted by Ian Maxon <im...@uci.edu>.

I've been looking at this a bit more, it turns out adm.grammar in
asterix-external-data is the "real" ADM format. It is suppose to
always accept suffixes of i8/16/32/etc after a digit sequence, but
something must be wrong with how the grammar is being translated. It
also appears that in some circumstances the parser can be coaxed into
taking the output. Therefore it seems to me at this time that the real
deficiency is in lexer-generator-maven-plugin and not elsewhere.

On 6/8/16, Ian Maxon <im...@uci.edu> wrote:
> I guess I don't view the round-trippability in the same way then, all it
> means to me is that I can scan/output the data, load it, and end up with
> the same thing, not necessarily that I can load it without specifying the
> types and get them anyway because they're inlined to the data. I think if
> we want that the better thing to do would be to do something like mysqldump
> (e.g. it dumps the metadata/types as an equivalent query basically). Also,
> if we changed the format to conflict with the existing output of SocialGen
> we'd have issues with current experiments and reproducing old results.
>
> On Wed, Jun 8, 2016 at 1:17 PM, Chris Hillery <ch...@hillery.land>
> wrote:
>
>> I think the answer there is "round-tripability", right? ADM is meant to
>> exactly describe the data so that it can be reloaded in the same way it
>> was. Someone correct me if that isn't a requirement of the format...
>>
>> Ceej
>> On Jun 8, 2016 9:14 AM, "Ian Maxon" <im...@uci.edu> wrote:
>>
>> > Why should the type be intermingled with the data though when it isn't
>> > strictly necessary? For example why do I care if someone used an int64
>> > to
>> > wrap something I know is actually a short integer, and so on. It also
>> kind
>> > of gets rid of the idea of ADM being a superset of JSON.
>> >
>> > On Tue, Jun 7, 2016 at 10:49 PM, Preston Carman <pr...@apache.org>
>> > wrote:
>> >
>> > > The interval type format has been finalized and is the same for AQL
>> > > and ADM. Below is an example of the format:
>> > >
>> > > interval(date("01-01-2011"), date("02-02-2012"))
>> > >
>> > > The interval constructor now uses other data type constructors to
>> > > recreate an interval. The type of interval is defined by the two
>> > > matching arguments.
>> > >
>> > >
>> > > On Tue, Jun 7, 2016 at 9:36 PM, Chris Hillery <ch...@hillery.land>
>> > > wrote:
>> > > > Ah, the other thing I forgot to mention is that I didn't include
>> > interval
>> > > > types, because I'm not sure about their current status. There was
>> some
>> > > > discussion on the list in January (subject "Round Tripping ADM
>> Interval
>> > > > Data") but I'm not sure where it ended up as far as the form of the
>> > > > constructors, and whether that was AQL or ADM or both.
>> > > >
>> > > > Ceej
>> > > > aka Chris Hillery
>> > > >
>> > > > On Tue, Jun 7, 2016 at 9:34 PM, Chris Hillery
>> > > > <chillery@hillery.land
>> >
>> > > wrote:
>> > > >
>> > > >> I started to create the current inventory of types, with the forms
>> > > >> accepted / produced by the ADM parser, AQL parser, and ADM
>> > > serialization.
>> > > >> (I think we all agree that ADM parser and ADM serializer should be
>> > 100%
>> > > >> compatible.) Here it is:
>> > > >>
>> > > >>
>> > > >>
>> > >
>> >
>> https://docs.google.com/spreadsheets/d/1-11a9ETV1Bdh_bUm9_CszY4hEGJGbEBaVKUWrzeS-As/edit?usp=sharing
>> > > >>
>> > > >> I know this is not comprehensive (for instance, I'm pretty sure
>> that a
>> > > >> naked integer will be parsed by both ADM and AQL as an int64, so
>> that
>> > > form
>> > > >> should be listed as an alternative) and I haven't verified that
>> > > >> the
>> > AQL
>> > > >> parser forms in particular are accurate, but I think it's close.
>> I've
>> > > set
>> > > >> it so anyone can edit that document, so please fill in the gaps if
>> you
>> > > know
>> > > >> of any.
>> > > >>
>> > > >> We should also fill in the exact accepted forms for the various
>> > derived
>> > > >> types like the datetime, spatial, hex, and UUID types - eg., the
>> valid
>> > > >> forms of the double-quoted string in the duration() constructor is
>> as
>> > > >> specified by XML schema, and so on.
>> > > >>
>> > > >> Ceej
>> > > >> aka Chris Hillery
>> > > >>
>> > > >> On Tue, Jun 7, 2016 at 8:53 PM, Chris Hillery
>> > > >> <chillery@hillery.land
>> >
>> > > >> wrote:
>> > > >>
>> > > >>> If it's possible, I think it would be least confusing if the
>> > serialized
>> > > >>> ADM format was identical to the corresponding data constructors
>> > > >>> in
>> > > AQL. It
>> > > >>> should be a goal IMHO that you can cut-and-paste an ADM file into
>> the
>> > > query
>> > > >>> box in the web UI and the result would be the same as loading the
>> > .adm.
>> > > >>>
>> > > >>> For more specifics, I think we need to write out for each data
>> > > >>> type
>> > > what
>> > > >>> the current ADM and AQL formats are, and then pick a final answer
>> for
>> > > the
>> > > >>> type (which may possibly be different from either of the current
>> > forms,
>> > > >>> although I suspect not). That will he the spec, and we can update
>> the
>> > > two
>> > > >>> parsers (and all the test cases) accordingly.
>> > > >>>
>> > > >>> I started an email thread sometime last year about something
>> > similar; I
>> > > >>> think it was about JSON serialization, but it at least had the
>> > > >>> AQL
>> > > side of
>> > > >>> this story for all simple types, I believe.
>> > > >>>
>> > > >>> Ceej
>> > > >>> aka Chris Hillery
>> > > >>> On Jun 7, 2016 8:17 PM, "Ian Maxon" <im...@uci.edu> wrote:
>> > > >>>
>> > > >>>> Hi all,
>> > > >>>> After my experience with having to fix a rather large ADM file
>> dump
>> > > from
>> > > >>>> a
>> > > >>>> query to make it load back into the system I was compelled to
>> > > >>>> try
>> my
>> > > hand
>> > > >>>> at making that not happen again. The first thing I tried my hand
>> at
>> > > was
>> > > >>>> basically what I did to make the file loadable but inside the
>> > > >>>> type
>> > > >>>> printers; just remove all of the 'i32' and so on suffixes, as
>> > > >>>> well
>> > as
>> > > >>>> making decimals not formatted in scientific notation. This is
>> pretty
>> > > easy
>> > > >>>> to do as well, not a huge change code-wise (but obviously I'll
>> have
>> > to
>> > > >>>> fix
>> > > >>>> all of the tests).
>> > > >>>>
>> > > >>>> This got me to think though, which is the format that we
>> > > >>>> actually
>> > > want?
>> > > >>>> The
>> > > >>>> current format that is output, or the format that we accept in
>> > > >>>> the
>> > > >>>> loader?
>> > > >>>> Since this is actually perhaps a language level change either
>> > > >>>> way
>> I
>> > > >>>> figured
>> > > >>>> I should find consensus before spending more time on it.
>> > > >>>>
>> > > >>>> Thoughts/comments are appreciated.
>> > > >>>>
>> > > >>>> Thanks,
>> > > >>>> - Ian
>> > > >>>>
>> > > >>>
>> > > >>
>> > >
>> >
>>
>

Re: The "real" ADM format

Posted by Ian Maxon <im...@uci.edu>.

I guess I don't view the round-trippability in the same way then, all it
means to me is that I can scan/output the data, load it, and end up with
the same thing, not necessarily that I can load it without specifying the
types and get them anyway because they're inlined to the data. I think if
we want that the better thing to do would be to do something like mysqldump
(e.g. it dumps the metadata/types as an equivalent query basically). Also,
if we changed the format to conflict with the existing output of SocialGen
we'd have issues with current experiments and reproducing old results.

On Wed, Jun 8, 2016 at 1:17 PM, Chris Hillery <ch...@hillery.land> wrote:

> I think the answer there is "round-tripability", right? ADM is meant to
> exactly describe the data so that it can be reloaded in the same way it
> was. Someone correct me if that isn't a requirement of the format...
>
> Ceej
> On Jun 8, 2016 9:14 AM, "Ian Maxon" <im...@uci.edu> wrote:
>
> > Why should the type be intermingled with the data though when it isn't
> > strictly necessary? For example why do I care if someone used an int64 to
> > wrap something I know is actually a short integer, and so on. It also
> kind
> > of gets rid of the idea of ADM being a superset of JSON.
> >
> > On Tue, Jun 7, 2016 at 10:49 PM, Preston Carman <pr...@apache.org>
> > wrote:
> >
> > > The interval type format has been finalized and is the same for AQL
> > > and ADM. Below is an example of the format:
> > >
> > > interval(date("01-01-2011"), date("02-02-2012"))
> > >
> > > The interval constructor now uses other data type constructors to
> > > recreate an interval. The type of interval is defined by the two
> > > matching arguments.
> > >
> > >
> > > On Tue, Jun 7, 2016 at 9:36 PM, Chris Hillery <ch...@hillery.land>
> > > wrote:
> > > > Ah, the other thing I forgot to mention is that I didn't include
> > interval
> > > > types, because I'm not sure about their current status. There was
> some
> > > > discussion on the list in January (subject "Round Tripping ADM
> Interval
> > > > Data") but I'm not sure where it ended up as far as the form of the
> > > > constructors, and whether that was AQL or ADM or both.
> > > >
> > > > Ceej
> > > > aka Chris Hillery
> > > >
> > > > On Tue, Jun 7, 2016 at 9:34 PM, Chris Hillery <chillery@hillery.land
> >
> > > wrote:
> > > >
> > > >> I started to create the current inventory of types, with the forms
> > > >> accepted / produced by the ADM parser, AQL parser, and ADM
> > > serialization.
> > > >> (I think we all agree that ADM parser and ADM serializer should be
> > 100%
> > > >> compatible.) Here it is:
> > > >>
> > > >>
> > > >>
> > >
> >
> https://docs.google.com/spreadsheets/d/1-11a9ETV1Bdh_bUm9_CszY4hEGJGbEBaVKUWrzeS-As/edit?usp=sharing
> > > >>
> > > >> I know this is not comprehensive (for instance, I'm pretty sure
> that a
> > > >> naked integer will be parsed by both ADM and AQL as an int64, so
> that
> > > form
> > > >> should be listed as an alternative) and I haven't verified that the
> > AQL
> > > >> parser forms in particular are accurate, but I think it's close.
> I've
> > > set
> > > >> it so anyone can edit that document, so please fill in the gaps if
> you
> > > know
> > > >> of any.
> > > >>
> > > >> We should also fill in the exact accepted forms for the various
> > derived
> > > >> types like the datetime, spatial, hex, and UUID types - eg., the
> valid
> > > >> forms of the double-quoted string in the duration() constructor is
> as
> > > >> specified by XML schema, and so on.
> > > >>
> > > >> Ceej
> > > >> aka Chris Hillery
> > > >>
> > > >> On Tue, Jun 7, 2016 at 8:53 PM, Chris Hillery <chillery@hillery.land
> >
> > > >> wrote:
> > > >>
> > > >>> If it's possible, I think it would be least confusing if the
> > serialized
> > > >>> ADM format was identical to the corresponding data constructors in
> > > AQL. It
> > > >>> should be a goal IMHO that you can cut-and-paste an ADM file into
> the
> > > query
> > > >>> box in the web UI and the result would be the same as loading the
> > .adm.
> > > >>>
> > > >>> For more specifics, I think we need to write out for each data type
> > > what
> > > >>> the current ADM and AQL formats are, and then pick a final answer
> for
> > > the
> > > >>> type (which may possibly be different from either of the current
> > forms,
> > > >>> although I suspect not). That will he the spec, and we can update
> the
> > > two
> > > >>> parsers (and all the test cases) accordingly.
> > > >>>
> > > >>> I started an email thread sometime last year about something
> > similar; I
> > > >>> think it was about JSON serialization, but it at least had the AQL
> > > side of
> > > >>> this story for all simple types, I believe.
> > > >>>
> > > >>> Ceej
> > > >>> aka Chris Hillery
> > > >>> On Jun 7, 2016 8:17 PM, "Ian Maxon" <im...@uci.edu> wrote:
> > > >>>
> > > >>>> Hi all,
> > > >>>> After my experience with having to fix a rather large ADM file
> dump
> > > from
> > > >>>> a
> > > >>>> query to make it load back into the system I was compelled to try
> my
> > > hand
> > > >>>> at making that not happen again. The first thing I tried my hand
> at
> > > was
> > > >>>> basically what I did to make the file loadable but inside the type
> > > >>>> printers; just remove all of the 'i32' and so on suffixes, as well
> > as
> > > >>>> making decimals not formatted in scientific notation. This is
> pretty
> > > easy
> > > >>>> to do as well, not a huge change code-wise (but obviously I'll
> have
> > to
> > > >>>> fix
> > > >>>> all of the tests).
> > > >>>>
> > > >>>> This got me to think though, which is the format that we actually
> > > want?
> > > >>>> The
> > > >>>> current format that is output, or the format that we accept in the
> > > >>>> loader?
> > > >>>> Since this is actually perhaps a language level change either way
> I
> > > >>>> figured
> > > >>>> I should find consensus before spending more time on it.
> > > >>>>
> > > >>>> Thoughts/comments are appreciated.
> > > >>>>
> > > >>>> Thanks,
> > > >>>> - Ian
> > > >>>>
> > > >>>
> > > >>
> > >
> >
>

Re: The "real" ADM format

Posted by Chris Hillery <ch...@hillery.land>.

I think the answer there is "round-tripability", right? ADM is meant to
exactly describe the data so that it can be reloaded in the same way it
was. Someone correct me if that isn't a requirement of the format...

Ceej
On Jun 8, 2016 9:14 AM, "Ian Maxon" <im...@uci.edu> wrote:

> Why should the type be intermingled with the data though when it isn't
> strictly necessary? For example why do I care if someone used an int64 to
> wrap something I know is actually a short integer, and so on. It also kind
> of gets rid of the idea of ADM being a superset of JSON.
>
> On Tue, Jun 7, 2016 at 10:49 PM, Preston Carman <pr...@apache.org>
> wrote:
>
> > The interval type format has been finalized and is the same for AQL
> > and ADM. Below is an example of the format:
> >
> > interval(date("01-01-2011"), date("02-02-2012"))
> >
> > The interval constructor now uses other data type constructors to
> > recreate an interval. The type of interval is defined by the two
> > matching arguments.
> >
> >
> > On Tue, Jun 7, 2016 at 9:36 PM, Chris Hillery <ch...@hillery.land>
> > wrote:
> > > Ah, the other thing I forgot to mention is that I didn't include
> interval
> > > types, because I'm not sure about their current status. There was some
> > > discussion on the list in January (subject "Round Tripping ADM Interval
> > > Data") but I'm not sure where it ended up as far as the form of the
> > > constructors, and whether that was AQL or ADM or both.
> > >
> > > Ceej
> > > aka Chris Hillery
> > >
> > > On Tue, Jun 7, 2016 at 9:34 PM, Chris Hillery <ch...@hillery.land>
> > wrote:
> > >
> > >> I started to create the current inventory of types, with the forms
> > >> accepted / produced by the ADM parser, AQL parser, and ADM
> > serialization.
> > >> (I think we all agree that ADM parser and ADM serializer should be
> 100%
> > >> compatible.) Here it is:
> > >>
> > >>
> > >>
> >
> https://docs.google.com/spreadsheets/d/1-11a9ETV1Bdh_bUm9_CszY4hEGJGbEBaVKUWrzeS-As/edit?usp=sharing
> > >>
> > >> I know this is not comprehensive (for instance, I'm pretty sure that a
> > >> naked integer will be parsed by both ADM and AQL as an int64, so that
> > form
> > >> should be listed as an alternative) and I haven't verified that the
> AQL
> > >> parser forms in particular are accurate, but I think it's close. I've
> > set
> > >> it so anyone can edit that document, so please fill in the gaps if you
> > know
> > >> of any.
> > >>
> > >> We should also fill in the exact accepted forms for the various
> derived
> > >> types like the datetime, spatial, hex, and UUID types - eg., the valid
> > >> forms of the double-quoted string in the duration() constructor is as
> > >> specified by XML schema, and so on.
> > >>
> > >> Ceej
> > >> aka Chris Hillery
> > >>
> > >> On Tue, Jun 7, 2016 at 8:53 PM, Chris Hillery <ch...@hillery.land>
> > >> wrote:
> > >>
> > >>> If it's possible, I think it would be least confusing if the
> serialized
> > >>> ADM format was identical to the corresponding data constructors in
> > AQL. It
> > >>> should be a goal IMHO that you can cut-and-paste an ADM file into the
> > query
> > >>> box in the web UI and the result would be the same as loading the
> .adm.
> > >>>
> > >>> For more specifics, I think we need to write out for each data type
> > what
> > >>> the current ADM and AQL formats are, and then pick a final answer for
> > the
> > >>> type (which may possibly be different from either of the current
> forms,
> > >>> although I suspect not). That will he the spec, and we can update the
> > two
> > >>> parsers (and all the test cases) accordingly.
> > >>>
> > >>> I started an email thread sometime last year about something
> similar; I
> > >>> think it was about JSON serialization, but it at least had the AQL
> > side of
> > >>> this story for all simple types, I believe.
> > >>>
> > >>> Ceej
> > >>> aka Chris Hillery
> > >>> On Jun 7, 2016 8:17 PM, "Ian Maxon" <im...@uci.edu> wrote:
> > >>>
> > >>>> Hi all,
> > >>>> After my experience with having to fix a rather large ADM file dump
> > from
> > >>>> a
> > >>>> query to make it load back into the system I was compelled to try my
> > hand
> > >>>> at making that not happen again. The first thing I tried my hand at
> > was
> > >>>> basically what I did to make the file loadable but inside the type
> > >>>> printers; just remove all of the 'i32' and so on suffixes, as well
> as
> > >>>> making decimals not formatted in scientific notation. This is pretty
> > easy
> > >>>> to do as well, not a huge change code-wise (but obviously I'll have
> to
> > >>>> fix
> > >>>> all of the tests).
> > >>>>
> > >>>> This got me to think though, which is the format that we actually
> > want?
> > >>>> The
> > >>>> current format that is output, or the format that we accept in the
> > >>>> loader?
> > >>>> Since this is actually perhaps a language level change either way I
> > >>>> figured
> > >>>> I should find consensus before spending more time on it.
> > >>>>
> > >>>> Thoughts/comments are appreciated.
> > >>>>
> > >>>> Thanks,
> > >>>> - Ian
> > >>>>
> > >>>
> > >>
> >
>

Re: The "real" ADM format

Posted by Ian Maxon <im...@uci.edu>.

Why should the type be intermingled with the data though when it isn't
strictly necessary? For example why do I care if someone used an int64 to
wrap something I know is actually a short integer, and so on. It also kind
of gets rid of the idea of ADM being a superset of JSON.

On Tue, Jun 7, 2016 at 10:49 PM, Preston Carman <pr...@apache.org> wrote:

> The interval type format has been finalized and is the same for AQL
> and ADM. Below is an example of the format:
>
> interval(date("01-01-2011"), date("02-02-2012"))
>
> The interval constructor now uses other data type constructors to
> recreate an interval. The type of interval is defined by the two
> matching arguments.
>
>
> On Tue, Jun 7, 2016 at 9:36 PM, Chris Hillery <ch...@hillery.land>
> wrote:
> > Ah, the other thing I forgot to mention is that I didn't include interval
> > types, because I'm not sure about their current status. There was some
> > discussion on the list in January (subject "Round Tripping ADM Interval
> > Data") but I'm not sure where it ended up as far as the form of the
> > constructors, and whether that was AQL or ADM or both.
> >
> > Ceej
> > aka Chris Hillery
> >
> > On Tue, Jun 7, 2016 at 9:34 PM, Chris Hillery <ch...@hillery.land>
> wrote:
> >
> >> I started to create the current inventory of types, with the forms
> >> accepted / produced by the ADM parser, AQL parser, and ADM
> serialization.
> >> (I think we all agree that ADM parser and ADM serializer should be 100%
> >> compatible.) Here it is:
> >>
> >>
> >>
> https://docs.google.com/spreadsheets/d/1-11a9ETV1Bdh_bUm9_CszY4hEGJGbEBaVKUWrzeS-As/edit?usp=sharing
> >>
> >> I know this is not comprehensive (for instance, I'm pretty sure that a
> >> naked integer will be parsed by both ADM and AQL as an int64, so that
> form
> >> should be listed as an alternative) and I haven't verified that the AQL
> >> parser forms in particular are accurate, but I think it's close. I've
> set
> >> it so anyone can edit that document, so please fill in the gaps if you
> know
> >> of any.
> >>
> >> We should also fill in the exact accepted forms for the various derived
> >> types like the datetime, spatial, hex, and UUID types - eg., the valid
> >> forms of the double-quoted string in the duration() constructor is as
> >> specified by XML schema, and so on.
> >>
> >> Ceej
> >> aka Chris Hillery
> >>
> >> On Tue, Jun 7, 2016 at 8:53 PM, Chris Hillery <ch...@hillery.land>
> >> wrote:
> >>
> >>> If it's possible, I think it would be least confusing if the serialized
> >>> ADM format was identical to the corresponding data constructors in
> AQL. It
> >>> should be a goal IMHO that you can cut-and-paste an ADM file into the
> query
> >>> box in the web UI and the result would be the same as loading the .adm.
> >>>
> >>> For more specifics, I think we need to write out for each data type
> what
> >>> the current ADM and AQL formats are, and then pick a final answer for
> the
> >>> type (which may possibly be different from either of the current forms,
> >>> although I suspect not). That will he the spec, and we can update the
> two
> >>> parsers (and all the test cases) accordingly.
> >>>
> >>> I started an email thread sometime last year about something similar; I
> >>> think it was about JSON serialization, but it at least had the AQL
> side of
> >>> this story for all simple types, I believe.
> >>>
> >>> Ceej
> >>> aka Chris Hillery
> >>> On Jun 7, 2016 8:17 PM, "Ian Maxon" <im...@uci.edu> wrote:
> >>>
> >>>> Hi all,
> >>>> After my experience with having to fix a rather large ADM file dump
> from
> >>>> a
> >>>> query to make it load back into the system I was compelled to try my
> hand
> >>>> at making that not happen again. The first thing I tried my hand at
> was
> >>>> basically what I did to make the file loadable but inside the type
> >>>> printers; just remove all of the 'i32' and so on suffixes, as well as
> >>>> making decimals not formatted in scientific notation. This is pretty
> easy
> >>>> to do as well, not a huge change code-wise (but obviously I'll have to
> >>>> fix
> >>>> all of the tests).
> >>>>
> >>>> This got me to think though, which is the format that we actually
> want?
> >>>> The
> >>>> current format that is output, or the format that we accept in the
> >>>> loader?
> >>>> Since this is actually perhaps a language level change either way I
> >>>> figured
> >>>> I should find consensus before spending more time on it.
> >>>>
> >>>> Thoughts/comments are appreciated.
> >>>>
> >>>> Thanks,
> >>>> - Ian
> >>>>
> >>>
> >>
>

Re: The "real" ADM format

Posted by Preston Carman <pr...@apache.org>.

The interval type format has been finalized and is the same for AQL
and ADM. Below is an example of the format:

interval(date("01-01-2011"), date("02-02-2012"))

The interval constructor now uses other data type constructors to
recreate an interval. The type of interval is defined by the two
matching arguments.


On Tue, Jun 7, 2016 at 9:36 PM, Chris Hillery <ch...@hillery.land> wrote:
> Ah, the other thing I forgot to mention is that I didn't include interval
> types, because I'm not sure about their current status. There was some
> discussion on the list in January (subject "Round Tripping ADM Interval
> Data") but I'm not sure where it ended up as far as the form of the
> constructors, and whether that was AQL or ADM or both.
>
> Ceej
> aka Chris Hillery
>
> On Tue, Jun 7, 2016 at 9:34 PM, Chris Hillery <ch...@hillery.land> wrote:
>
>> I started to create the current inventory of types, with the forms
>> accepted / produced by the ADM parser, AQL parser, and ADM serialization.
>> (I think we all agree that ADM parser and ADM serializer should be 100%
>> compatible.) Here it is:
>>
>>
>> https://docs.google.com/spreadsheets/d/1-11a9ETV1Bdh_bUm9_CszY4hEGJGbEBaVKUWrzeS-As/edit?usp=sharing
>>
>> I know this is not comprehensive (for instance, I'm pretty sure that a
>> naked integer will be parsed by both ADM and AQL as an int64, so that form
>> should be listed as an alternative) and I haven't verified that the AQL
>> parser forms in particular are accurate, but I think it's close. I've set
>> it so anyone can edit that document, so please fill in the gaps if you know
>> of any.
>>
>> We should also fill in the exact accepted forms for the various derived
>> types like the datetime, spatial, hex, and UUID types - eg., the valid
>> forms of the double-quoted string in the duration() constructor is as
>> specified by XML schema, and so on.
>>
>> Ceej
>> aka Chris Hillery
>>
>> On Tue, Jun 7, 2016 at 8:53 PM, Chris Hillery <ch...@hillery.land>
>> wrote:
>>
>>> If it's possible, I think it would be least confusing if the serialized
>>> ADM format was identical to the corresponding data constructors in AQL. It
>>> should be a goal IMHO that you can cut-and-paste an ADM file into the query
>>> box in the web UI and the result would be the same as loading the .adm.
>>>
>>> For more specifics, I think we need to write out for each data type what
>>> the current ADM and AQL formats are, and then pick a final answer for the
>>> type (which may possibly be different from either of the current forms,
>>> although I suspect not). That will he the spec, and we can update the two
>>> parsers (and all the test cases) accordingly.
>>>
>>> I started an email thread sometime last year about something similar; I
>>> think it was about JSON serialization, but it at least had the AQL side of
>>> this story for all simple types, I believe.
>>>
>>> Ceej
>>> aka Chris Hillery
>>> On Jun 7, 2016 8:17 PM, "Ian Maxon" <im...@uci.edu> wrote:
>>>
>>>> Hi all,
>>>> After my experience with having to fix a rather large ADM file dump from
>>>> a
>>>> query to make it load back into the system I was compelled to try my hand
>>>> at making that not happen again. The first thing I tried my hand at was
>>>> basically what I did to make the file loadable but inside the type
>>>> printers; just remove all of the 'i32' and so on suffixes, as well as
>>>> making decimals not formatted in scientific notation. This is pretty easy
>>>> to do as well, not a huge change code-wise (but obviously I'll have to
>>>> fix
>>>> all of the tests).
>>>>
>>>> This got me to think though, which is the format that we actually want?
>>>> The
>>>> current format that is output, or the format that we accept in the
>>>> loader?
>>>> Since this is actually perhaps a language level change either way I
>>>> figured
>>>> I should find consensus before spending more time on it.
>>>>
>>>> Thoughts/comments are appreciated.
>>>>
>>>> Thanks,
>>>> - Ian
>>>>
>>>
>>

Re: The "real" ADM format

Posted by Chris Hillery <ch...@hillery.land>.

Ah, the other thing I forgot to mention is that I didn't include interval
types, because I'm not sure about their current status. There was some
discussion on the list in January (subject "Round Tripping ADM Interval
Data") but I'm not sure where it ended up as far as the form of the
constructors, and whether that was AQL or ADM or both.

Ceej
aka Chris Hillery

On Tue, Jun 7, 2016 at 9:34 PM, Chris Hillery <ch...@hillery.land> wrote:

> I started to create the current inventory of types, with the forms
> accepted / produced by the ADM parser, AQL parser, and ADM serialization.
> (I think we all agree that ADM parser and ADM serializer should be 100%
> compatible.) Here it is:
>
>
> https://docs.google.com/spreadsheets/d/1-11a9ETV1Bdh_bUm9_CszY4hEGJGbEBaVKUWrzeS-As/edit?usp=sharing
>
> I know this is not comprehensive (for instance, I'm pretty sure that a
> naked integer will be parsed by both ADM and AQL as an int64, so that form
> should be listed as an alternative) and I haven't verified that the AQL
> parser forms in particular are accurate, but I think it's close. I've set
> it so anyone can edit that document, so please fill in the gaps if you know
> of any.
>
> We should also fill in the exact accepted forms for the various derived
> types like the datetime, spatial, hex, and UUID types - eg., the valid
> forms of the double-quoted string in the duration() constructor is as
> specified by XML schema, and so on.
>
> Ceej
> aka Chris Hillery
>
> On Tue, Jun 7, 2016 at 8:53 PM, Chris Hillery <ch...@hillery.land>
> wrote:
>
>> If it's possible, I think it would be least confusing if the serialized
>> ADM format was identical to the corresponding data constructors in AQL. It
>> should be a goal IMHO that you can cut-and-paste an ADM file into the query
>> box in the web UI and the result would be the same as loading the .adm.
>>
>> For more specifics, I think we need to write out for each data type what
>> the current ADM and AQL formats are, and then pick a final answer for the
>> type (which may possibly be different from either of the current forms,
>> although I suspect not). That will he the spec, and we can update the two
>> parsers (and all the test cases) accordingly.
>>
>> I started an email thread sometime last year about something similar; I
>> think it was about JSON serialization, but it at least had the AQL side of
>> this story for all simple types, I believe.
>>
>> Ceej
>> aka Chris Hillery
>> On Jun 7, 2016 8:17 PM, "Ian Maxon" <im...@uci.edu> wrote:
>>
>>> Hi all,
>>> After my experience with having to fix a rather large ADM file dump from
>>> a
>>> query to make it load back into the system I was compelled to try my hand
>>> at making that not happen again. The first thing I tried my hand at was
>>> basically what I did to make the file loadable but inside the type
>>> printers; just remove all of the 'i32' and so on suffixes, as well as
>>> making decimals not formatted in scientific notation. This is pretty easy
>>> to do as well, not a huge change code-wise (but obviously I'll have to
>>> fix
>>> all of the tests).
>>>
>>> This got me to think though, which is the format that we actually want?
>>> The
>>> current format that is output, or the format that we accept in the
>>> loader?
>>> Since this is actually perhaps a language level change either way I
>>> figured
>>> I should find consensus before spending more time on it.
>>>
>>> Thoughts/comments are appreciated.
>>>
>>> Thanks,
>>> - Ian
>>>
>>
>

Re: The "real" ADM format

Posted by Chris Hillery <ch...@hillery.land>.

I started to create the current inventory of types, with the forms accepted
/ produced by the ADM parser, AQL parser, and ADM serialization. (I think
we all agree that ADM parser and ADM serializer should be 100% compatible.)
Here it is:

https://docs.google.com/spreadsheets/d/1-11a9ETV1Bdh_bUm9_CszY4hEGJGbEBaVKUWrzeS-As/edit?usp=sharing

I know this is not comprehensive (for instance, I'm pretty sure that a
naked integer will be parsed by both ADM and AQL as an int64, so that form
should be listed as an alternative) and I haven't verified that the AQL
parser forms in particular are accurate, but I think it's close. I've set
it so anyone can edit that document, so please fill in the gaps if you know
of any.

We should also fill in the exact accepted forms for the various derived
types like the datetime, spatial, hex, and UUID types - eg., the valid
forms of the double-quoted string in the duration() constructor is as
specified by XML schema, and so on.

Ceej
aka Chris Hillery

On Tue, Jun 7, 2016 at 8:53 PM, Chris Hillery <ch...@hillery.land> wrote:

> If it's possible, I think it would be least confusing if the serialized
> ADM format was identical to the corresponding data constructors in AQL. It
> should be a goal IMHO that you can cut-and-paste an ADM file into the query
> box in the web UI and the result would be the same as loading the .adm.
>
> For more specifics, I think we need to write out for each data type what
> the current ADM and AQL formats are, and then pick a final answer for the
> type (which may possibly be different from either of the current forms,
> although I suspect not). That will he the spec, and we can update the two
> parsers (and all the test cases) accordingly.
>
> I started an email thread sometime last year about something similar; I
> think it was about JSON serialization, but it at least had the AQL side of
> this story for all simple types, I believe.
>
> Ceej
> aka Chris Hillery
> On Jun 7, 2016 8:17 PM, "Ian Maxon" <im...@uci.edu> wrote:
>
>> Hi all,
>> After my experience with having to fix a rather large ADM file dump from a
>> query to make it load back into the system I was compelled to try my hand
>> at making that not happen again. The first thing I tried my hand at was
>> basically what I did to make the file loadable but inside the type
>> printers; just remove all of the 'i32' and so on suffixes, as well as
>> making decimals not formatted in scientific notation. This is pretty easy
>> to do as well, not a huge change code-wise (but obviously I'll have to fix
>> all of the tests).
>>
>> This got me to think though, which is the format that we actually want?
>> The
>> current format that is output, or the format that we accept in the loader?
>> Since this is actually perhaps a language level change either way I
>> figured
>> I should find consensus before spending more time on it.
>>
>> Thoughts/comments are appreciated.
>>
>> Thanks,
>> - Ian
>>
>

Re: The "real" ADM format

Posted by Chris Hillery <ch...@hillery.land>.

If it's possible, I think it would be least confusing if the serialized ADM
format was identical to the corresponding data constructors in AQL. It
should be a goal IMHO that you can cut-and-paste an ADM file into the query
box in the web UI and the result would be the same as loading the .adm.

For more specifics, I think we need to write out for each data type what
the current ADM and AQL formats are, and then pick a final answer for the
type (which may possibly be different from either of the current forms,
although I suspect not). That will he the spec, and we can update the two
parsers (and all the test cases) accordingly.

I started an email thread sometime last year about something similar; I
think it was about JSON serialization, but it at least had the AQL side of
this story for all simple types, I believe.

Ceej
aka Chris Hillery
On Jun 7, 2016 8:17 PM, "Ian Maxon" <im...@uci.edu> wrote:

> Hi all,
> After my experience with having to fix a rather large ADM file dump from a
> query to make it load back into the system I was compelled to try my hand
> at making that not happen again. The first thing I tried my hand at was
> basically what I did to make the file loadable but inside the type
> printers; just remove all of the 'i32' and so on suffixes, as well as
> making decimals not formatted in scientific notation. This is pretty easy
> to do as well, not a huge change code-wise (but obviously I'll have to fix
> all of the tests).
>
> This got me to think though, which is the format that we actually want? The
> current format that is output, or the format that we accept in the loader?
> Since this is actually perhaps a language level change either way I figured
> I should find consensus before spending more time on it.
>
> Thoughts/comments are appreciated.
>
> Thanks,
> - Ian
>