You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Andrew Pilloud <ap...@google.com> on 2018/05/07 22:16:13 UTC

Merging our two SQL parser configs

So we have two incompatible SQL parser configs in beam. One is in
BeamQueryPlanner
<https://github.com/apache/beam/blob/8ef71b6eb1d2d5c63974ec506a01faf3813efe74/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L95>
 which is used by default and a second in BeamSqlParser
<https://github.com/apache/beam/blob/598774738e7a1236cf30f70a584311cee52d1818/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamSqlParser.java#L33>
which is used only in the BeamSqlCli execute path (not the explain path).
There are also a bunch of 'toLowerCase()' calls scattered around our code.
I'd like to get us on one parser config and remove the need for toLowerCase
calls.

To do this, I am proposing we standardize these all to go through
BeamQueryPlanner, use the Calcite Lex.JAVA config, and drop the
'toLowerCase()' calls. This will result in the parser preserving case,
being case sensitive, and using backticks for quoted identifiers. This is
the same as the default config in Apache Flink and is roughly compatible
with BigQuery. It effectively leaves the default path unchanged, except
case will now be preserved and checked consistently. The BeamSqlCli execute
path will remain unchanged for unquoted queries with all lower case names,
which is what we have tested. Comments? Objections?

Andrew

Re: Merging our two SQL parser configs

Posted by Andrew Pilloud <ap...@google.com>.
Haven't heard anything, so I wrote up the change:
https://github.com/apache/beam/pull/5325

Andrew

On Mon, May 7, 2018 at 3:16 PM Andrew Pilloud <ap...@google.com> wrote:

> So we have two incompatible SQL parser configs in beam. One is in
> BeamQueryPlanner
> <https://github.com/apache/beam/blob/8ef71b6eb1d2d5c63974ec506a01faf3813efe74/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L95>
>  which is used by default and a second in BeamSqlParser
> <https://github.com/apache/beam/blob/598774738e7a1236cf30f70a584311cee52d1818/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/parser/BeamSqlParser.java#L33>
> which is used only in the BeamSqlCli execute path (not the explain path).
> There are also a bunch of 'toLowerCase()' calls scattered around our code.
> I'd like to get us on one parser config and remove the need for toLowerCase
> calls.
>
> To do this, I am proposing we standardize these all to go through
> BeamQueryPlanner, use the Calcite Lex.JAVA config, and drop the
> 'toLowerCase()' calls. This will result in the parser preserving case,
> being case sensitive, and using backticks for quoted identifiers. This is
> the same as the default config in Apache Flink and is roughly compatible
> with BigQuery. It effectively leaves the default path unchanged, except
> case will now be preserved and checked consistently. The BeamSqlCli execute
> path will remain unchanged for unquoted queries with all lower case names,
> which is what we have tested. Comments? Objections?
>
> Andrew
>