You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by Ted Dunning <te...@gmail.com> on 2012/10/12 07:48:15 UTC

logical plan design coming together

The design for the logical plan is coming together.  Anybody should be able
to get to the interim design document at

https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit

You should also be able to see the discussion so far.  Many thanks to
Timothy Chen for kibitzing very well as I wrote.  His astute observations
and questions were critical.

I have to go sleep now, but it would be great to see progress on this while
I sleep.  Remember that comments and questions are as valuable (or more so)
than text.  Remember also, this document has a complete history so we can
reconstruct it no matter what happens.

I would particularly like eyes on this (if practical) from Camuel, Jason,
Gera and Julian Hyde.  They have had some very good thoughts about this
layer in the past and probably will spot several errors in what I have
written.

The plan for this document as it stabilizes is to put it into the web-site
under the documentation area.  WE will probably want to do that before it
really is done to make sure that people can find it easily and to ensure a
checkpoint is in Apache-land.

See y'all tomorrow.

Re: logical plan design coming together

Posted by Camuel Gilyadov <ca...@gmail.com>.

On Fri, Oct 12, 2012 at 11:32 AM, Julian Hyde <ju...@gmail.com> wrote:

> For those implementing parsing & validation of the query language. Please
> let me share my hard-earned wisdom...
>
> 1. Separate parsing and validation. The parser should do the absolute
> minimum of validation. Don't try to validate identifiers. Don't do any
> type-checking. It will make errors better ('This function needs a boolean
> parameter' versus 'Expecting "true" or "false" or "<token> and" or 101
> other possibilities'.) And allows the parser to stay focused on one task
> which is difficult enough: converting text into a parse tree.
>

Completely agree. I call it parser stage and semantic analysis stage and
they must not be interleaved. Semantic analysis must start only after
complete query is parsed. Moreover, I have hard time separating semantic
validation logic from semantic analysis logic. So I decided that parser
will only parse and not bother to do checks like resolving identifiers.
Even if it will do, during semantic analysis it is always possible that
some subtle new errors with the query structure will be detected. So let's
let assign parsing to parser and semantic validation to semantic analyzer
which is completely separated from parser.

Particularly parser will not differentiate between built-in functions and
custom functions. So parser will not "know" about some reserved keywords of
DrQL and I think it is good so.

In other words, in modern XML/JSON terms :) I would say that parser must
check for "well-formed-ness" of the DrQL and semantic analyser for the
schema validation.

>
> 2. During the validation phase, do not modify the parse tree. If you need
> to annotate each node with a type, put it into a map from parse tree node
> -> type, not into a field in each node. Put any state you need (e.g. scope
> for resolving identifiers) into a temporary state that exists only during
> validation (think of the visitor pattern). And definitely do not do any
> tree-surgery. If you need to rewrite the tree, do it post validation. (In
> the planner, or just before planning, is a good time.) See
> http://en.wikipedia.org/wiki/Immutable_object.
>

Well, I understand the point here. However, I still think it worth putting
all the work of converting parse-tree to AST on the ANTLR shoulders saving
us a this chunk of logic altogether. The price to pay is a bit cryptic
error messages when DrQL is not even parsable or is not "well-formed"  if
you like that term more. If the DrQL would be a stable language following
some standard then I would back the approach of hand-coded parse-tree =>
AST conversion. However, DrQL syntax most probably will be very evolving to
say at least so why spend time to hand-code parser-tree => AST conversion
when it will be outdated in a few weeks?

>
> Julian
>
> On Oct 12, 2012, at 10:34 AM, Ted Dunning <te...@gmail.com> wrote:
>
> > Great comments.
> >
> > One particular high-level comment that Julian made is a criticism that I
> > have made in the past of other projects.  It is probably good for my
> > character to be on the receiving side of this criticism for once.
> >
> > The question is why should we use/invent a new concrete syntax when JSON
> > would do just as well (I am dropping the XML part of the suggestion due
> to
> > known prejudices on this list).
> >
> > I don't have a good answer to this question.  It makes certain problems
> > quite a bit easier.  Moreover, I have said in the past that it is nuts to
> > re-invent concrete syntax for config files and extension languages like
> > this.
> >
> > My course going forward is that I think I will put down both syntaxes and
> > let folks form their own opinion.  Using JSON will definitely move things
> > ahead more quickly since other folks have done the parser for us.
> >
> > On Fri, Oct 12, 2012 at 12:05 AM, Julian Hyde <ju...@gmail.com>
> wrote:
> >
> >> Ted,
> >>
> >> Great start. I've made some comments on the doc.
> >>
> >> Julian
> >>
> >> On Oct 11, 2012, at 10:48 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >>
> >>> The design for the logical plan is coming together.  Anybody should be
> >> able
> >>> to get to the interim design document at
> >>>
> >>>
> >>
> https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit
> >>>
> >>> You should also be able to see the discussion so far.  Many thanks to
> >>> Timothy Chen for kibitzing very well as I wrote.  His astute
> observations
> >>> and questions were critical.
> >>>
> >>> I have to go sleep now, but it would be great to see progress on this
> >> while
> >>> I sleep.  Remember that comments and questions are as valuable (or more
> >> so)
> >>> than text.  Remember also, this document has a complete history so we
> can
> >>> reconstruct it no matter what happens.
> >>>
> >>> I would particularly like eyes on this (if practical) from Camuel,
> Jason,
> >>> Gera and Julian Hyde.  They have had some very good thoughts about this
> >>> layer in the past and probably will spot several errors in what I have
> >>> written.
> >>>
> >>> The plan for this document as it stabilizes is to put it into the
> >> web-site
> >>> under the documentation area.  WE will probably want to do that before
> it
> >>> really is done to make sure that people can find it easily and to
> ensure
> >> a
> >>> checkpoint is in Apache-land.
> >>>
> >>> See y'all tomorrow.
> >>
> >>
>
>

Re: logical plan design coming together

Posted by Julian Hyde <ju...@gmail.com>.

On Oct 12, 2012, at 12:04 PM, Ted Dunning <te...@gmail.com> wrote:

> I spoke with Jason and he is a little less enthusiastic about JSON for an
> SSA sort of language.
> 
> I think having me elaborate the current document with JSON versions would
> help the discussion be more concrete.

To be clear: I am fine with the LLVM-style representation. I actually think it's rather neat to make the query-preparation process as similar to code compilation as possible, and see where that takes us. I was listing some things to watch out for, just to make sure that this works in the large. (I would have had a longer list if you'd proposed XML or JSON. :) )

Julian

Re: logical plan design coming together

Posted by Ted Dunning <te...@gmail.com>.

I spoke with Jason and he is a little less enthusiastic about JSON for an
SSA sort of language.

I think having me elaborate the current document with JSON versions would
help the discussion be more concrete.

On Fri, Oct 12, 2012 at 12:00 PM, Timothy Chen <tn...@gmail.com> wrote:

> So is the conclusion now is to decide a new logical plan schema based on
> JSON?
>
> Tim
>
> On Fri, Oct 12, 2012 at 11:32 AM, Julian Hyde <ju...@gmail.com>
> wrote:
>
> > For those implementing parsing & validation of the query language. Please
> > let me share my hard-earned wisdom...
> >
> > 1. Separate parsing and validation. The parser should do the absolute
> > minimum of validation. Don't try to validate identifiers. Don't do any
> > type-checking. It will make errors better ('This function needs a boolean
> > parameter' versus 'Expecting "true" or "false" or "<token> and" or 101
> > other possibilities'.) And allows the parser to stay focused on one task
> > which is difficult enough: converting text into a parse tree.
> >
> > 2. During the validation phase, do not modify the parse tree. If you need
> > to annotate each node with a type, put it into a map from parse tree node
> > -> type, not into a field in each node. Put any state you need (e.g.
> scope
> > for resolving identifiers) into a temporary state that exists only during
> > validation (think of the visitor pattern). And definitely do not do any
> > tree-surgery. If you need to rewrite the tree, do it post validation. (In
> > the planner, or just before planning, is a good time.) See
> > http://en.wikipedia.org/wiki/Immutable_object.
> >
> > Julian
> >
> > On Oct 12, 2012, at 10:34 AM, Ted Dunning <te...@gmail.com> wrote:
> >
> > > Great comments.
> > >
> > > One particular high-level comment that Julian made is a criticism that
> I
> > > have made in the past of other projects.  It is probably good for my
> > > character to be on the receiving side of this criticism for once.
> > >
> > > The question is why should we use/invent a new concrete syntax when
> JSON
> > > would do just as well (I am dropping the XML part of the suggestion due
> > to
> > > known prejudices on this list).
> > >
> > > I don't have a good answer to this question.  It makes certain problems
> > > quite a bit easier.  Moreover, I have said in the past that it is nuts
> to
> > > re-invent concrete syntax for config files and extension languages like
> > > this.
> > >
> > > My course going forward is that I think I will put down both syntaxes
> and
> > > let folks form their own opinion.  Using JSON will definitely move
> things
> > > ahead more quickly since other folks have done the parser for us.
> > >
> > > On Fri, Oct 12, 2012 at 12:05 AM, Julian Hyde <ju...@gmail.com>
> > wrote:
> > >
> > >> Ted,
> > >>
> > >> Great start. I've made some comments on the doc.
> > >>
> > >> Julian
> > >>
> > >> On Oct 11, 2012, at 10:48 PM, Ted Dunning <te...@gmail.com>
> > wrote:
> > >>
> > >>> The design for the logical plan is coming together.  Anybody should
> be
> > >> able
> > >>> to get to the interim design document at
> > >>>
> > >>>
> > >>
> >
> https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit
> > >>>
> > >>> You should also be able to see the discussion so far.  Many thanks to
> > >>> Timothy Chen for kibitzing very well as I wrote.  His astute
> > observations
> > >>> and questions were critical.
> > >>>
> > >>> I have to go sleep now, but it would be great to see progress on this
> > >> while
> > >>> I sleep.  Remember that comments and questions are as valuable (or
> more
> > >> so)
> > >>> than text.  Remember also, this document has a complete history so we
> > can
> > >>> reconstruct it no matter what happens.
> > >>>
> > >>> I would particularly like eyes on this (if practical) from Camuel,
> > Jason,
> > >>> Gera and Julian Hyde.  They have had some very good thoughts about
> this
> > >>> layer in the past and probably will spot several errors in what I
> have
> > >>> written.
> > >>>
> > >>> The plan for this document as it stabilizes is to put it into the
> > >> web-site
> > >>> under the documentation area.  WE will probably want to do that
> before
> > it
> > >>> really is done to make sure that people can find it easily and to
> > ensure
> > >> a
> > >>> checkpoint is in Apache-land.
> > >>>
> > >>> See y'all tomorrow.
> > >>
> > >>
> >
> >
>

Re: logical plan design coming together

Posted by Timothy Chen <tn...@gmail.com>.

So is the conclusion now is to decide a new logical plan schema based on
JSON?

Tim

On Fri, Oct 12, 2012 at 11:32 AM, Julian Hyde <ju...@gmail.com> wrote:

> For those implementing parsing & validation of the query language. Please
> let me share my hard-earned wisdom...
>
> 1. Separate parsing and validation. The parser should do the absolute
> minimum of validation. Don't try to validate identifiers. Don't do any
> type-checking. It will make errors better ('This function needs a boolean
> parameter' versus 'Expecting "true" or "false" or "<token> and" or 101
> other possibilities'.) And allows the parser to stay focused on one task
> which is difficult enough: converting text into a parse tree.
>
> 2. During the validation phase, do not modify the parse tree. If you need
> to annotate each node with a type, put it into a map from parse tree node
> -> type, not into a field in each node. Put any state you need (e.g. scope
> for resolving identifiers) into a temporary state that exists only during
> validation (think of the visitor pattern). And definitely do not do any
> tree-surgery. If you need to rewrite the tree, do it post validation. (In
> the planner, or just before planning, is a good time.) See
> http://en.wikipedia.org/wiki/Immutable_object.
>
> Julian
>
> On Oct 12, 2012, at 10:34 AM, Ted Dunning <te...@gmail.com> wrote:
>
> > Great comments.
> >
> > One particular high-level comment that Julian made is a criticism that I
> > have made in the past of other projects.  It is probably good for my
> > character to be on the receiving side of this criticism for once.
> >
> > The question is why should we use/invent a new concrete syntax when JSON
> > would do just as well (I am dropping the XML part of the suggestion due
> to
> > known prejudices on this list).
> >
> > I don't have a good answer to this question.  It makes certain problems
> > quite a bit easier.  Moreover, I have said in the past that it is nuts to
> > re-invent concrete syntax for config files and extension languages like
> > this.
> >
> > My course going forward is that I think I will put down both syntaxes and
> > let folks form their own opinion.  Using JSON will definitely move things
> > ahead more quickly since other folks have done the parser for us.
> >
> > On Fri, Oct 12, 2012 at 12:05 AM, Julian Hyde <ju...@gmail.com>
> wrote:
> >
> >> Ted,
> >>
> >> Great start. I've made some comments on the doc.
> >>
> >> Julian
> >>
> >> On Oct 11, 2012, at 10:48 PM, Ted Dunning <te...@gmail.com>
> wrote:
> >>
> >>> The design for the logical plan is coming together.  Anybody should be
> >> able
> >>> to get to the interim design document at
> >>>
> >>>
> >>
> https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit
> >>>
> >>> You should also be able to see the discussion so far.  Many thanks to
> >>> Timothy Chen for kibitzing very well as I wrote.  His astute
> observations
> >>> and questions were critical.
> >>>
> >>> I have to go sleep now, but it would be great to see progress on this
> >> while
> >>> I sleep.  Remember that comments and questions are as valuable (or more
> >> so)
> >>> than text.  Remember also, this document has a complete history so we
> can
> >>> reconstruct it no matter what happens.
> >>>
> >>> I would particularly like eyes on this (if practical) from Camuel,
> Jason,
> >>> Gera and Julian Hyde.  They have had some very good thoughts about this
> >>> layer in the past and probably will spot several errors in what I have
> >>> written.
> >>>
> >>> The plan for this document as it stabilizes is to put it into the
> >> web-site
> >>> under the documentation area.  WE will probably want to do that before
> it
> >>> really is done to make sure that people can find it easily and to
> ensure
> >> a
> >>> checkpoint is in Apache-land.
> >>>
> >>> See y'all tomorrow.
> >>
> >>
>
>

Re: logical plan design coming together

Posted by Julian Hyde <ju...@gmail.com>.

For those implementing parsing & validation of the query language. Please let me share my hard-earned wisdom...

1. Separate parsing and validation. The parser should do the absolute minimum of validation. Don't try to validate identifiers. Don't do any type-checking. It will make errors better ('This function needs a boolean parameter' versus 'Expecting "true" or "false" or "<token> and" or 101 other possibilities'.) And allows the parser to stay focused on one task which is difficult enough: converting text into a parse tree.

2. During the validation phase, do not modify the parse tree. If you need to annotate each node with a type, put it into a map from parse tree node -> type, not into a field in each node. Put any state you need (e.g. scope for resolving identifiers) into a temporary state that exists only during validation (think of the visitor pattern). And definitely do not do any tree-surgery. If you need to rewrite the tree, do it post validation. (In the planner, or just before planning, is a good time.) See http://en.wikipedia.org/wiki/Immutable_object.

Julian

On Oct 12, 2012, at 10:34 AM, Ted Dunning <te...@gmail.com> wrote:

> Great comments.
> 
> One particular high-level comment that Julian made is a criticism that I
> have made in the past of other projects.  It is probably good for my
> character to be on the receiving side of this criticism for once.
> 
> The question is why should we use/invent a new concrete syntax when JSON
> would do just as well (I am dropping the XML part of the suggestion due to
> known prejudices on this list).
> 
> I don't have a good answer to this question.  It makes certain problems
> quite a bit easier.  Moreover, I have said in the past that it is nuts to
> re-invent concrete syntax for config files and extension languages like
> this.
> 
> My course going forward is that I think I will put down both syntaxes and
> let folks form their own opinion.  Using JSON will definitely move things
> ahead more quickly since other folks have done the parser for us.
> 
> On Fri, Oct 12, 2012 at 12:05 AM, Julian Hyde <ju...@gmail.com> wrote:
> 
>> Ted,
>> 
>> Great start. I've made some comments on the doc.
>> 
>> Julian
>> 
>> On Oct 11, 2012, at 10:48 PM, Ted Dunning <te...@gmail.com> wrote:
>> 
>>> The design for the logical plan is coming together.  Anybody should be
>> able
>>> to get to the interim design document at
>>> 
>>> 
>> https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit
>>> 
>>> You should also be able to see the discussion so far.  Many thanks to
>>> Timothy Chen for kibitzing very well as I wrote.  His astute observations
>>> and questions were critical.
>>> 
>>> I have to go sleep now, but it would be great to see progress on this
>> while
>>> I sleep.  Remember that comments and questions are as valuable (or more
>> so)
>>> than text.  Remember also, this document has a complete history so we can
>>> reconstruct it no matter what happens.
>>> 
>>> I would particularly like eyes on this (if practical) from Camuel, Jason,
>>> Gera and Julian Hyde.  They have had some very good thoughts about this
>>> layer in the past and probably will spot several errors in what I have
>>> written.
>>> 
>>> The plan for this document as it stabilizes is to put it into the
>> web-site
>>> under the documentation area.  WE will probably want to do that before it
>>> really is done to make sure that people can find it easily and to ensure
>> a
>>> checkpoint is in Apache-land.
>>> 
>>> See y'all tomorrow.
>> 
>>

Re: logical plan design coming together

Posted by Ted Dunning <te...@gmail.com>.

Great comments.

One particular high-level comment that Julian made is a criticism that I
have made in the past of other projects.  It is probably good for my
character to be on the receiving side of this criticism for once.

The question is why should we use/invent a new concrete syntax when JSON
would do just as well (I am dropping the XML part of the suggestion due to
known prejudices on this list).

I don't have a good answer to this question.  It makes certain problems
quite a bit easier.  Moreover, I have said in the past that it is nuts to
re-invent concrete syntax for config files and extension languages like
this.

My course going forward is that I think I will put down both syntaxes and
let folks form their own opinion.  Using JSON will definitely move things
ahead more quickly since other folks have done the parser for us.

On Fri, Oct 12, 2012 at 12:05 AM, Julian Hyde <ju...@gmail.com> wrote:

> Ted,
>
> Great start. I've made some comments on the doc.
>
> Julian
>
> On Oct 11, 2012, at 10:48 PM, Ted Dunning <te...@gmail.com> wrote:
>
> > The design for the logical plan is coming together.  Anybody should be
> able
> > to get to the interim design document at
> >
> >
> https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit
> >
> > You should also be able to see the discussion so far.  Many thanks to
> > Timothy Chen for kibitzing very well as I wrote.  His astute observations
> > and questions were critical.
> >
> > I have to go sleep now, but it would be great to see progress on this
> while
> > I sleep.  Remember that comments and questions are as valuable (or more
> so)
> > than text.  Remember also, this document has a complete history so we can
> > reconstruct it no matter what happens.
> >
> > I would particularly like eyes on this (if practical) from Camuel, Jason,
> > Gera and Julian Hyde.  They have had some very good thoughts about this
> > layer in the past and probably will spot several errors in what I have
> > written.
> >
> > The plan for this document as it stabilizes is to put it into the
> web-site
> > under the documentation area.  WE will probably want to do that before it
> > really is done to make sure that people can find it easily and to ensure
> a
> > checkpoint is in Apache-land.
> >
> > See y'all tomorrow.
>
>

Re: logical plan design coming together

Posted by Julian Hyde <ju...@gmail.com>.

Ted,

Great start. I've made some comments on the doc.

Julian

On Oct 11, 2012, at 10:48 PM, Ted Dunning <te...@gmail.com> wrote:

> The design for the logical plan is coming together.  Anybody should be able
> to get to the interim design document at
> 
> https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit
> 
> You should also be able to see the discussion so far.  Many thanks to
> Timothy Chen for kibitzing very well as I wrote.  His astute observations
> and questions were critical.
> 
> I have to go sleep now, but it would be great to see progress on this while
> I sleep.  Remember that comments and questions are as valuable (or more so)
> than text.  Remember also, this document has a complete history so we can
> reconstruct it no matter what happens.
> 
> I would particularly like eyes on this (if practical) from Camuel, Jason,
> Gera and Julian Hyde.  They have had some very good thoughts about this
> layer in the past and probably will spot several errors in what I have
> written.
> 
> The plan for this document as it stabilizes is to put it into the web-site
> under the documentation area.  WE will probably want to do that before it
> really is done to make sure that people can find it easily and to ensure a
> checkpoint is in Apache-land.
> 
> See y'all tomorrow.