You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "paul-rogers (via GitHub)" <gi...@apache.org> on 2023/02/09 01:55:00 UTC

[GitHub] [druid] paul-rogers opened a new pull request, #13777: Information schema uses numeric column types

paul-rogers opened a new pull request, #13777:
URL: https://github.com/apache/druid/pull/13777

   DruidDruid has an `INFORMATION_SCHEMA` to provide metadata about Druid tables and columns. Historically, columns that contain numbers (such as `ORDINAL_POSITION`) were represented as `VARCHAR`. This seems to be due to an implementation limitation. Some numeric columns (such as `NUMERIC_PRECISION`) are valid only some of the time, and require SQL `NULL` values other times. The schema of the information schema columns was describe using Druid types, and Druid does not seem to provide a nullable `BIGINT` column. Thus, the only way to set `NUMERIC_PRECISION` to `NULL` is to define it as a Druid `string`.)
   
   While this works, it has the unfortunate result of sorting incorrectly. Consider the following query:
   
   ```sql
   SELECT ORDINAL_POSITION, COLUMN_NAME
   FROM INFORMATION_SCHEMA.COLUMNS
   WHERE TABLE_NAME = 'foo'
   ORDER BY ORDINAL_POSITION
   ```
   
   If the table has more than ten columns, the ordering  of the ordinals is 1, 11, 12, 2, ... because "11" sorts as a string before "2". Needless to say, this is quite surprising. It makes it hard for a client to obtain columns in ordinal order. (The workaround is to _not_ sort since the columns are produced in the correct order, but this is a somewhat silly workaround.)
   
   The solution is to use SQL types, not Druid types, to declare the table schema. After all, the information schema tables are processed by the Linq4j engine, which understands SQL types. We had to (incorrectly) translate the Druid types to SQL.
   
   After making this change, the ordinals do, in fact, sort in numeric order.
   
   In testing this change, I noticed that Druid numbers columns counting from 0. However, SQL prefers counting from 1. So, this change also modifies the column ordinals to start with 1.
   
   Updated the documentation to use SQL types (not Druid types) in the description of the system tables. These tables can only be accessed via SQL and thus only the SQL types are relevant.
   
   #### Release note
   
   The affect of this change is to make the `INFORMATION_SCHEMA` tables more SQL-like. If a client was depending on numeric fields being strings, then that client may produce unexpected results. Similarly, if a client was expecting columns to be numbered from 0, then the new 1-based numbering may impact the client. Otherwise, clients will find that queries against `INFORMATION_SCHEMA` tables act more like a user would expect: more like they act in other databases.
   
   <hr>
   
   This PR has:
   
   - [X] been self-reviewed.
   - [X] added documentation for new or modified features or behaviors.
   - [X] a release note entry in the PR description.
   - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on a diff in pull request #13777: Information schema now uses numeric column types

Posted by "paul-rogers (via GitHub)" <gi...@apache.org>.
paul-rogers commented on code in PR #13777:
URL: https://github.com/apache/druid/pull/13777#discussion_r1104791829


##########
sql/src/main/java/org/apache/druid/sql/calcite/schema/InformationSchema.java:
##########
@@ -73,44 +73,65 @@ public class InformationSchema extends AbstractSchema
   private static final String SCHEMATA_TABLE = "SCHEMATA";
   private static final String TABLES_TABLE = "TABLES";
   private static final String COLUMNS_TABLE = "COLUMNS";
-  private static final RowSignature SCHEMATA_SIGNATURE = RowSignature

Review Comment:
   This is, in fact, the core issue in this PR. If we use a Druid signature and round-trip to SQL, we lose SQL semantics. However, these are SQL, not Druid tables, so SQL semantics apply.
   
   If we were to execute system tables queries with a Druid engine then the most obvious choice is to integrate SQL semantics into the Druid engine. That is, numeric types can be null. There may be historical reasons why numbers can't be null in Druid, but there are also good historical reasons why numeric types can be null in SQL.
   
   The question is whether we want to alter SQL to follow Druid semantics or the other way around. I a consequence of using Druid semantics is that numbers sort in the wrong order, then I do suspect SQL is more right than Druid in this one case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13777: Information schema now uses numeric column types

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13777:
URL: https://github.com/apache/druid/pull/13777#discussion_r1103891003


##########
sql/src/main/java/org/apache/druid/sql/calcite/schema/InformationSchema.java:
##########
@@ -73,44 +73,65 @@ public class InformationSchema extends AbstractSchema
   private static final String SCHEMATA_TABLE = "SCHEMATA";
   private static final String TABLES_TABLE = "TABLES";
   private static final String COLUMNS_TABLE = "COLUMNS";
-  private static final RowSignature SCHEMATA_SIGNATURE = RowSignature

Review Comment:
   nit: why the change from using `RowSignature`? Just thinking ahead and wondering if this has any impact on if we want to make `INFORMATION_SCHEMA` and `sys` tables be executed with the Druid engine (instead of the "bindable" fallback engine). If they are modeled as native datasources then we would want these to be defined in terms of the native types...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers merged pull request #13777: Information schema now uses numeric column types

Posted by "paul-rogers (via GitHub)" <gi...@apache.org>.
paul-rogers merged PR #13777:
URL: https://github.com/apache/druid/pull/13777


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on pull request #13777: Information schema now uses numeric column types

Posted by "paul-rogers (via GitHub)" <gi...@apache.org>.
paul-rogers commented on PR #13777:
URL: https://github.com/apache/druid/pull/13777#issuecomment-1428330569

   > is it really necessary to commit all of these .pyc files?
   
   No, it is not, thanks for catching the error. That's an artifact of switching branches. Removed them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13777: Information schema now uses numeric column types

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13777:
URL: https://github.com/apache/druid/pull/13777#discussion_r1103891003


##########
sql/src/main/java/org/apache/druid/sql/calcite/schema/InformationSchema.java:
##########
@@ -73,44 +73,65 @@ public class InformationSchema extends AbstractSchema
   private static final String SCHEMATA_TABLE = "SCHEMATA";
   private static final String TABLES_TABLE = "TABLES";
   private static final String COLUMNS_TABLE = "COLUMNS";
-  private static final RowSignature SCHEMATA_SIGNATURE = RowSignature

Review Comment:
   nit: why the change from using `RowSignature`? Just thinking ahead and wondering if this has any impact on if we want to make `INFORMATION_SCHEMA` and `sys` tables be executed with the Druid engine (instead of the "bindable" fallback engine). If they are modeled as native datasources then we would want these to be defined in terms of the native types.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13777: Information schema now uses numeric column types

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13777:
URL: https://github.com/apache/druid/pull/13777#discussion_r1105040084


##########
sql/src/main/java/org/apache/druid/sql/calcite/schema/InformationSchema.java:
##########
@@ -73,44 +73,65 @@ public class InformationSchema extends AbstractSchema
   private static final String SCHEMATA_TABLE = "SCHEMATA";
   private static final String TABLES_TABLE = "TABLES";
   private static final String COLUMNS_TABLE = "COLUMNS";
-  private static final RowSignature SCHEMATA_SIGNATURE = RowSignature

Review Comment:
   >If we were to execute system tables queries with a Druid engine then the most obvious choice is to integrate SQL semantics into the Druid engine. That is, numeric types can be null. There may be historical reasons why numbers can't be null in Druid, but there are also good historical reasons why numeric types can be null in SQL.
   
   I think you misunderstood or I wasn't very articulate; numeric types can already be nullable in Druids native type system, we just have a mode, `druid.generic.useDefaultValueForNull=true`, in which numbers are never null and replaced with zeros. This is embedded in the conversion logic to allow the planner to optimize things when it knows things will never be nullable. The problem here is using that same conversion logic for tables that don't follow the rule.
   
   I guess the main thing I was trying to discuss here is that afaik there are still (non-immediate) plans to push all system tables down into the druid engine instead of using the fallback "bindable" engine, so that we have a consistent set of behavior and operators no matter what table you are querying, and at which point we would probably want these schemas to be expressed in Druid native type system for describing tables, (`RowSignature`), so I was wondering if that means we would just end up be changing these back at that point.
   
   I think its probably fine to make this change, git remembers, so we can always fetch these `RowSignatures` (and modify to use nullable numerics) from git history if/when we make the system tables some sort of native druid table and queries in the future.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on pull request #13777: Information schema now uses numeric column types

Posted by "paul-rogers (via GitHub)" <gi...@apache.org>.
paul-rogers commented on PR #13777:
URL: https://github.com/apache/druid/pull/13777#issuecomment-1435358089

   After discussion with @clintropolis, we decided to use the approach here for now. There may be upcoming changes that allow a `RowSignature` to contain a nullability flag, and that uses the Druid engine for system tables. When those occur, we can revert to using the now-more-powerful `RowSignature` for table schema. For now, the SQL approach gets us what we want. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] paul-rogers commented on a diff in pull request #13777: Information schema now uses numeric column types

Posted by "paul-rogers (via GitHub)" <gi...@apache.org>.
paul-rogers commented on code in PR #13777:
URL: https://github.com/apache/druid/pull/13777#discussion_r1105159319


##########
sql/src/main/java/org/apache/druid/sql/calcite/schema/InformationSchema.java:
##########
@@ -73,44 +73,65 @@ public class InformationSchema extends AbstractSchema
   private static final String SCHEMATA_TABLE = "SCHEMATA";
   private static final String TABLES_TABLE = "TABLES";
   private static final String COLUMNS_TABLE = "COLUMNS";
-  private static final RowSignature SCHEMATA_SIGNATURE = RowSignature

Review Comment:
   @clintropolis, we  can make the needed change now. The thing you cited sounds like a Druid property, which means it applies to all columns? Or, is there a way to do the SQL thing and set the nullability bit on each column? That is, the other way to solve the issue is to say:
   
   ```java
   RowSignature
         .builder()
         ...
         .add("ORDINAL_POSITION", ColumnType.BIGINT) // Non-nullable
         ...
         .add("CHARACTER_MAXIMUM_LENGTH", ColumnType.BIGINT, true) // Nullable
   ```
   
   Do we have the ability to do something like the above today?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a diff in pull request #13777: Information schema now uses numeric column types

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis commented on code in PR #13777:
URL: https://github.com/apache/druid/pull/13777#discussion_r1103891904


##########
sql/src/main/java/org/apache/druid/sql/calcite/schema/InformationSchema.java:
##########
@@ -73,44 +73,65 @@ public class InformationSchema extends AbstractSchema
   private static final String SCHEMATA_TABLE = "SCHEMATA";
   private static final String TABLES_TABLE = "TABLES";
   private static final String COLUMNS_TABLE = "COLUMNS";
-  private static final RowSignature SCHEMATA_SIGNATURE = RowSignature

Review Comment:
   oh i see in the description something about nullable bigint, but I think the problem is probably using `RowSignatures.toRelDataType` which only allows numeric columns to be nullable if `druid.generic.useDefaultValueForNull=false`, which is valid for native druid engine, but probably isn't the correct behavior for these tables.
   
   I wonder if it would it be the correct behavior if these were executed by the Druid engine though and `druid.generic.useDefaultValueForNull=true`? I hope to switch the default to be `false` someday to be more sql compliant out of the box, but perhaps worth thinking about.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org