You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@calcite.apache.org by Mark Grey <mg...@spotify.com.INVALID> on 2021/10/20 18:25:03 UTC

Pseudocolumn support

Hello Calcite community,

First of all, many thanks to the many contributors for this awesome and
very useful project!

I had a question with regard to amending the behavior of the SqlValidator
to account for the existence in some dialects of SQL of pseudocolumns.
Examples of this sort of construct exist in BigQuery, where they are used
as a mechanism for sharding
<https://cloud.google.com/bigquery/docs/querying-wildcard-tables#filtering_selected_tables_using_table_suffix>
or partitioning
<https://cloud.google.com/bigquery/docs/querying-wildcard-tables#scanning_a_range_of_ingestion-time_partitioned_tables_using_partitiontime>.
They are essentially column identifiers that do some special operations
within the query without belonging to a specific table within the schema.
They can legally appear in projection, filtering and grouping clauses as
normal columns would, but are not members of the tables referenced within
the FROM clause.

Is there possibly a way to extend the SqlValidator to pass these cases and
treat them as relations that exist at the schema level?  Or alternatively,
to treat them as literals?  Our use case is such that we'd need the
validator to pass queries that include them, but not necessarily extract
them into an underlying relational algebra expression.

Would appreciate any pointers on classes to look at for inspiration on a
possible approach.

Thanks!

Mark

Re: Pseudocolumn support

Posted by Julian Hyde <jh...@gmail.com>.

There’s already an issue logged for this: https://issues.apache.org/jira/browse/CALCITE-4362 <https://issues.apache.org/jira/browse/CALCITE-4362>. I commented on that case with a sketch of how to approach it.

As I understand it, these columns do “belong to the table” in some sense. If you are joining tables T1 and T2, they each independently have their own _PARTITIONDATE column, which you would write T1._PARTITIONDATE and T2._PARTITIONDATE. So they are more like Oracle’s ROWID than, say, the CURRENT_TIMESTAMP function which is global to the query.

Let’s discuss further in the JIRA case. 

> On Oct 20, 2021, at 11:25 AM, Mark Grey <mg...@spotify.com.INVALID> wrote:
> 
> Hello Calcite community,
> 
> First of all, many thanks to the many contributors for this awesome and
> very useful project!
> 
> I had a question with regard to amending the behavior of the SqlValidator
> to account for the existence in some dialects of SQL of pseudocolumns.
> Examples of this sort of construct exist in BigQuery, where they are used
> as a mechanism for sharding
> <https://cloud.google.com/bigquery/docs/querying-wildcard-tables#filtering_selected_tables_using_table_suffix>
> or partitioning
> <https://cloud.google.com/bigquery/docs/querying-wildcard-tables#scanning_a_range_of_ingestion-time_partitioned_tables_using_partitiontime>.
> They are essentially column identifiers that do some special operations
> within the query without belonging to a specific table within the schema.
> They can legally appear in projection, filtering and grouping clauses as
> normal columns would, but are not members of the tables referenced within
> the FROM clause.
> 
> Is there possibly a way to extend the SqlValidator to pass these cases and
> treat them as relations that exist at the schema level?  Or alternatively,
> to treat them as literals?  Our use case is such that we'd need the
> validator to pass queries that include them, but not necessarily extract
> them into an underlying relational algebra expression.
> 
> Would appreciate any pointers on classes to look at for inspiration on a
> possible approach.
> 
> Thanks!
> 
> Mark