You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "paul-rogers (via GitHub)" <gi...@apache.org> on 2023/03/03 20:08:03 UTC
[GitHub] [druid] paul-rogers commented on a diff in pull request #13686: Integrate the catalog with the Calcite planner

paul-rogers commented on code in PR #13686:
URL: https://github.com/apache/druid/pull/13686#discussion_r1124956771


##########
sql/src/main/codegen/includes/insert.ftl:
##########
@@ -21,13 +21,14 @@
 SqlNode DruidSqlInsertEof() :
 {
   SqlNode insertNode;
-  org.apache.druid.java.util.common.Pair<Granularity, String> partitionedBy = new org.apache.druid.java.util.common.Pair(null, null);
+  SqlNode partitionedBy = null;
   SqlNodeList clusteredBy = null;
 }
 {
   insertNode = SqlInsert()
-  // PARTITIONED BY is necessary, but is kept optional in the grammar. It is asserted that it is not missing in the
-  // DruidSqlInsert constructor so that we can return a custom error message.
+  // PARTITIONED BY is necessary. It can be provided either in this statement or in the catalog.

Review Comment:
   Yes and no. The above _syntax_ is allowed. Semantically, the above is allowed if the catalog provides partitioning but not clustering. (Which would be odd, but it is possible.) The above is _not_ allowed if there is no catalog entry: the user will get an error saying that partitioning is required.



##########
docs/catalog/api.md:
##########
@@ -0,0 +1,261 @@
+---
+id: intro
+title: Table Metadata Catalog
+sidebar_label: Intro
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+The table metadata feature adds a number of new REST APIs to create, read, update and delete
+(CRUD) table metadata entries. Table metadata is a set of "hints" on top of physical datasources, and
+a set of convenience entries for external tables. Changing the metadata "hints" has no effect
+on existing datasources or segments, or on in-flight ingestion jobs. Metadata changes only
+affect future queries and integestion jobs.
+
+**Note**: a future enhancment might allow a single operation to, say, delete both a metadata
+entry and the actual datasource. That functionality is not yet available. In this version, each
+metadata operation affects _only_ the information in the table metadata table within Druid's
+metadata store.
+
+A metadata table entry is called a "table specification" (or "table spec") and is represented
+by a Java class called `TableSpec`. The JSON representation of this object appears below.
+
+The API calls are designed for two use cases:
+
+* Configuration-as-code, in which the application uploads a complete table spec. Like in
+Kubernetes or other deployment tools, the source of truth for specs is in a source repostitory,
+typically Git. The user deploys new table specs to Druid as part of rolling out an updated
+application.
+* UI-driven, in which you use the Druid Console or similar tool to edit the table metadata
+directly.
+
+The configuration-as-code use case works with entire specs. The UI use case may work with an
+entire spec, or may perform focused "edit" operations. Here it is important to remember that
+Druid is a distributed, multi-user system: it could be that multiple users edit the same
+table at the same time. The API to overwrite a table spec provides a version number to allow
+a UI to implement optimistic locking: if the current verion is not the one the UI read, then
+Druid rejects the change. A set of "edit" operations perform focused chagnges that avoid the
+need for optimistic locking: Druid applies the change rather than the UI.
+
+All table metadata APIs begin with a common prefix: `/druid/coordinator/v1/catalog`.
+
+Tables are identified by a path: the schema and table name. Druid has a fixed, pre-defined set
+of schemas:
+
+* `druid`: Datasources
+* `ext`: External tables
+
+There are others, but those do not accept table metadata entries.
+
+## Configuration-as-Code
+
+CRUD for an entire `TableSpec`, optionally protected by a version.
+
+### `POST {prefix}/schemas/{schema}/tables/{table}[?version={n}|overwrite=true|false]`
+
+Create or update a `TableSpec`.
+
+* With no options, the semantics are "create if not exists."
+* With a version, the semantics are "update if exists and is at the given version"
+* With "overwrite", the semantics are "create or update"
+
+Use `overwrite=true` when some other system is the source of truth: you want to force
+the table metadata to match. Otherwise, first read the existing metadata, which provides
+the data version. Apply changes and set `version={n}` to the version retrieved. Druid will
+reject the chagne it other changes have occured since the data was read. The client should

Review Comment:
   Thanks! I wonder why our spell-checker didn't catch these? Or, maybe it will once the build gets that far...



##########
docs/catalog/api.md:
##########
@@ -0,0 +1,261 @@
+---
+id: intro
+title: Table Metadata Catalog
+sidebar_label: Intro
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+The table metadata feature adds a number of new REST APIs to create, read, update and delete
+(CRUD) table metadata entries. Table metadata is a set of "hints" on top of physical datasources, and
+a set of convenience entries for external tables. Changing the metadata "hints" has no effect
+on existing datasources or segments, or on in-flight ingestion jobs. Metadata changes only
+affect future queries and integestion jobs.
+
+**Note**: a future enhancment might allow a single operation to, say, delete both a metadata
+entry and the actual datasource. That functionality is not yet available. In this version, each
+metadata operation affects _only_ the information in the table metadata table within Druid's
+metadata store.
+
+A metadata table entry is called a "table specification" (or "table spec") and is represented
+by a Java class called `TableSpec`. The JSON representation of this object appears below.
+
+The API calls are designed for two use cases:
+
+* Configuration-as-code, in which the application uploads a complete table spec. Like in
+Kubernetes or other deployment tools, the source of truth for specs is in a source repostitory,
+typically Git. The user deploys new table specs to Druid as part of rolling out an updated
+application.
+* UI-driven, in which you use the Druid Console or similar tool to edit the table metadata
+directly.
+
+The configuration-as-code use case works with entire specs. The UI use case may work with an
+entire spec, or may perform focused "edit" operations. Here it is important to remember that
+Druid is a distributed, multi-user system: it could be that multiple users edit the same
+table at the same time. The API to overwrite a table spec provides a version number to allow
+a UI to implement optimistic locking: if the current verion is not the one the UI read, then
+Druid rejects the change. A set of "edit" operations perform focused chagnges that avoid the

Review Comment:
   Rewrote the passage to be a bit clearer. Please let me know if the new text reads better.



##########
docs/catalog/api.md:
##########
@@ -0,0 +1,261 @@
+---
+id: intro
+title: Table Metadata Catalog
+sidebar_label: Intro
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+The table metadata feature adds a number of new REST APIs to create, read, update and delete
+(CRUD) table metadata entries. Table metadata is a set of "hints" on top of physical datasources, and
+a set of convenience entries for external tables. Changing the metadata "hints" has no effect
+on existing datasources or segments, or on in-flight ingestion jobs. Metadata changes only
+affect future queries and integestion jobs.
+
+**Note**: a future enhancment might allow a single operation to, say, delete both a metadata
+entry and the actual datasource. That functionality is not yet available. In this version, each
+metadata operation affects _only_ the information in the table metadata table within Druid's
+metadata store.
+
+A metadata table entry is called a "table specification" (or "table spec") and is represented
+by a Java class called `TableSpec`. The JSON representation of this object appears below.
+
+The API calls are designed for two use cases:
+
+* Configuration-as-code, in which the application uploads a complete table spec. Like in
+Kubernetes or other deployment tools, the source of truth for specs is in a source repostitory,
+typically Git. The user deploys new table specs to Druid as part of rolling out an updated
+application.
+* UI-driven, in which you use the Druid Console or similar tool to edit the table metadata
+directly.
+
+The configuration-as-code use case works with entire specs. The UI use case may work with an
+entire spec, or may perform focused "edit" operations. Here it is important to remember that
+Druid is a distributed, multi-user system: it could be that multiple users edit the same
+table at the same time. The API to overwrite a table spec provides a version number to allow
+a UI to implement optimistic locking: if the current verion is not the one the UI read, then
+Druid rejects the change. A set of "edit" operations perform focused chagnges that avoid the
+need for optimistic locking: Druid applies the change rather than the UI.
+
+All table metadata APIs begin with a common prefix: `/druid/coordinator/v1/catalog`.
+
+Tables are identified by a path: the schema and table name. Druid has a fixed, pre-defined set
+of schemas:
+
+* `druid`: Datasources
+* `ext`: External tables
+
+There are others, but those do not accept table metadata entries.
+
+## Configuration-as-Code
+
+CRUD for an entire `TableSpec`, optionally protected by a version.
+
+### `POST {prefix}/schemas/{schema}/tables/{table}[?version={n}|overwrite=true|false]`
+
+Create or update a `TableSpec`.
+
+* With no options, the semantics are "create if not exists."
+* With a version, the semantics are "update if exists and is at the given version"
+* With "overwrite", the semantics are "create or update"
+
+Use `overwrite=true` when some other system is the source of truth: you want to force
+the table metadata to match. Otherwise, first read the existing metadata, which provides
+the data version. Apply changes and set `version={n}` to the version retrieved. Druid will
+reject the chagne it other changes have occured since the data was read. The client should
+read the new version, reapply the changes,and submit the change again.
+
+### `DELETE {prefix}/schemas/{schema}/tables/{table}`
+
+Removes the metadata for the given table. Note that this API _does not_ affect the datasource itself
+or any segments for the datasource. Deleting a table metadata entry simply says that you no longer
+wish to use the data governance features for that datasource.
+
+## Editing
+
+Edit operations apply a specific transform to an existing table spec. Because the transform is very
+specific, there is no need for a version as there is little scope for unintentional overwrites. The
+edit actions are specficially designed to be use by UI tools such as the Druid Console.
+
+### `POST {prefix}/schemas/{schema}/tables/{table}/edit`
+
+The payload is a message that says what to change:
+
+* Hide columns (add to the hidden columns list)
+* Unhide columns (remove from the hidden columns list)
+* Drop columns (remove items from the columns list)
+* Move columns
+* Update props (merge updated properties with existing)
+* Update columns (merge updated columns with existing)
+
+## Retrieval
+
+Finally, the API provides a variety of ways to pull information from the table metadata
+storage. Again, the information provided here is _only_ the metadata hints. Query the
+Druid system tables to get the "effective" schema for all Druid datasources. The effective
+schema includes not just the tables and columns defined in metadata, but any additional
+items that physically exist in segments.
+
+### `GET {prefix}/schemas[?format=name|path|metadata]`
+
+Returns one of:
+
+* `name`: The list of schema names (Default)
+* `path`: The list of all table paths (i.e. `{schema}.{table}` pairs)
+* `metadata`: A list of metadata for all tables in all schemas.
+
+### `GET {prefix}/schemas/{schema}`
+
+Not supported. Reserved to obtain information about the schema itself.
+
+### `GET {prefix}/schemas/{schema}/tables[?format=name|metadata`]
+
+Returns one of:
+
+* `name`: The list of table names within the schema.
+* `metadata`: The list of `TableMetadata` for each table within the schema.
+
+### `GET {prefix}/schemas/{schema}/tables/{table}[?format=spec|metadata|status`
+
+Retrieves information for one table. Formats are:
+
+* `spec`: The `TableSpec` (default)
+* `metadata`: the `TableMetadata`
+* `status`: the `TableMetadata` without the spec (that is, the name, state, creation date and update date)
+
+### Synchronization
+
+Druid uses two additional APIs internally to synchronize table metadata. The Coordinator
+stores the metadata, and services the APIs described above. The Broker uses the metadata
+to plan queries. These synchronization API keep the Broker in sync with the Coordinator.
+You should never need to use these (unless you are also creating a synchronization solution),
+but you may want to know about them to understand traffic beteen Druid servers.
+
+### `GET /sync/tables`
+
+ Returns a list of `TableSpec` objects for bootstrapping a Broker.
+
+Similar to`GET /entry/tables`, but that message returns metadata.
+
+### `POST /sync/delta`
+
+On the broker, receives update (delta) messages from the coordinator. The payload is an object:
+
+* Create/update: table path plus the `TableSpec`
+* Delete: table path only
+
+## Table Specification
+
+You submit a table spec when creating or updating a table metadataentry.
+The table spec object consists of four parts:
+
+* `type`: the kind of table described by the spec
+* `properties`: the set of properties for the table (see below)
+* `columns`: the set of columns for the table.
+
+Example:
+
+```json
+"spec": {
+  "type": "<type>",
+  "properties": { ... },
+  "columns": [ ... ]
+}
+```
+
+### Datasource Table Spec
+
+The syntax of a datasource table spec follows the general pattern. A datasource is
+described by a set of properties, as defined (LINK NEEDED). The type of a datasource
+table is `datasource`. Columns are of type `column`.
+
+Example:
+
+```json
+{
+   "type":"datasource",
+   "properties": {
+     "description": "<text>",
+     "segmentGranularity": "<period>",
+     "targetSegmentRows": <number>,
+     "clusterKeys": [ { "column": "<name">, "desc": true|false } ... ],
+     "hiddenColumns: [ "<name>" ... ]
+   },
+   "columns": [ ... ],
+}
+```
+
+### External Table Spec
+
+An external table spec has the same basic structure. Most external tables have just
+two properties:
+
+`source`: The JSON-serialized form of a Druid input source.
+`format`: The JSON-serialized form of a Druid input format.
+
+These two properties directly correspond to the JSON you would use with the MSQ
+`extern` statement. (In the catalog, the third argument, signature, is computed
+from the set of columns in the table spec.)
+
+See (LINK NEEDED) for details on the available input sources and formats, and on
+how to define a partial table completed with additional information at ingest time.
+
+### Table Metadata
+
+Druid returns a table metadata entry when you retrieve data for one or more tables.
+Each instance includes the table spec plus additional "meta-metadata" about the entry.
+
+* `id`: a map of two fields that gives the table path.
+  * `schema`: one of the Druid-supported schemas described above.
+  * `name`: table name
+* `creationTime`: UTC Timestamp (in milliseconds since the epoch) of the original creation
+time of the metadata entry (not actual datasource) was created.
+* `updateTime`: UTC Timestamp (in milliseconds since the epoch) of the most recent creation
+or update. This is the record's "version" when using optimistic locking. Return this in
+the `version={updateTime}` field when doing an update using optimistic locking.
+* `state`: For datasources. Normally ACTIVE. May be DELETING if datasource deletion is in

Review Comment:
   Actually, this field is also "meta-metadata": it is intended to be set by the system in your yet-to-be-implemented `DROP TABLE` SQL command.
   
   A confusion might be that this state is about metadata: not the datasource itself. This is state marks that the table metadata exists, but is not usable. It handles the "when can I reuse the table name" question. It takes time to delete a datasource, so we first mark the metadata (if it exists) as `DELETING`, then some code does the actual work to do the deletion, then we remove the metadata entry. This prevents the user from starting a `DROP TABLE foo` immediately followed by a `CREATE TABLE foo` before all segments of the old `foo` are fully deleted.
   
   This flag isn't used yet, but was added because of the extreme difficulty of updating Druid's metadata tables across releases: we have to handle both the upgrade and downgrade cases. So, we're speculating that we will need this soon, and adding it now.
   
   All attributes are about the Table, so we could call all of them `tableId`, `tableSpec`, etc. Since _only_ table-related data is here, the code elected to emit the "table" prefix which just repeats what is in the `TableMetadata` class name.
   



##########
docs/catalog/index.md:
##########
@@ -0,0 +1,314 @@
+---
+id: index
+title: Table Metadata Catalog
+sidebar_label: Introduction
+description: Introduces the table metadata features
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+Druid 26.0 introduces the idea of _metadata_ to describe datasources and external
+tables. External table metadata simplifies MSQ ingestion statements. Datasource
+metadata gives you another data governance tool to manage and communicate the data
+stored in a datasource. The user of table metadata is entirely optional: you can
+use it for no tables, some tables or all tables as your needs evolve.
+
+The table metadata feature consists of the main parts:
+
+* A new table in Druid's catalog (also known as the metadata store),
+* A set of new REST APIs to manage table metadata entries,
+* SQL integratoin to put the metadata to work.
+
+Table metadata is an experimental feature in this release. We encourage you to try
+it and to provide feedback. However, since the feature is experimental, some parts
+may change based on feedback.
+
+In this section, we use the term "table" to refer to both datasources and external
+tables. To SQL, these are both tables, and the table metadata feature treats them
+similarly.
+
+## Contents
+
+- [Introduction](./index.md) - Overview of the table metadata feature.
+- [Metadata API](./api.md) - Reference for the REST API operations to create and modify metadata.
+- [Datasource Metadata](./datasource.md) - Details for datasource metadata and partial tables.
+- [External Tables](./ext.md) - Details about external tables used in MSQ ingestion.
+
+
+## Enable Table Metadata Storage
+
+Table metadata storage resides in an extension which is not loaded by default. Instead,
+you must explicitly "opt in" to using this feature. Table metadata store is required
+for the other features described in this section.
+
+Enable table metadata storage by adding `druid-catalog` to your list of loaded
+extensions in `_common/common.runtime.properties`:
+
+```text
+druid.extensions.loadList=[..., "druid-catalog"]
+```
+
+Enabing this extension causes Druid to create the table metadata table and to
+make the table metadata REST APIs available for use.
+
+## When to Use Table Metadata
+
+People use Druid in many ways. When learning Druid, we often load data just to
+see how Druid works and to experiment. For this, "classic" Druid is ideal: just
+load data and go: there is no need for additional datasource metadata.
+
+As we put Druid to work, we craft an application around Druid: a data pipeline
+that feeds data into Druid, UI (dashboards, applications) which pull data out,
+and deployment scripts to keep the system running. At this point, it can be
+useful to design the table schema with as much care as the rest of the system.
+What columns are needed? What column names will be the most clear to the end
+user? What data type should each column have?
+
+Historically, developers encoded these decisions into Druid ingestion specs.
+With MSQ, that information is encoded into SQL statements. To control changes,
+the specs or SQL statements are versioned in Git, subject to reviews when changes
+occur.
+
+This approach works well when there is a single source of data for each datasource.
+It can become difficult to manage when we want to load multiple, different inputs
+into a single datasource, and we want to create a single, composite schema that
+represents all of them. Think, for example, of a metrics application in which
+data arrives from various metrics feeds such as logs and so on. We want to decide
+upon a common set of columns so that, say HTTP status is always stored as a string
+in a column called "http_status", even if some of the inputs use names such as
+"status", "httpCode", or "statusCode". This is, in fact, the classic [ETL
+(extract, transform and load)](https://en.wikipedia.org/wiki/Extract,_transform,_load)
+design challenge.
+
+In this case, having a known, defined datasource schema becomes essential. The job
+of each of many ingestion specs or MSQ queries is to transform the incoming data from
+the format in which it appears into the form we've designed for our datasource.
+While table metadata is generally useful, it is this many-to-one mapping scenario
+where it finds its greatest need.
+
+### Table Metadata is Optional
+
+Table metadata is a new feature. Existing Druid systems have physical datasource
+metadata, but no logical data. As a result, table metdata can be thought of as a set
+of "hints": optional information that, if present, provides additional features not
+available when only using the existing physical metadata. See the
+[datasource section](./datasource.md) for information on the use of metadata with
+datasources. Without external metadata, you can use the existing `extern()` table
+function to specify an MSQ input source, format and schema.
+
+### Table Metadata is Extensible
+
+This documentation explains the Druid-defined properties available for datasources
+and external tables. The metadata system is extensible: you can define other
+properties, as long as those properties use names distinct from the Druid-defined
+names. For example, you might want to track the source of the data, or might want
+to include display name or display format properties to help your UI display
+columns.
+
+## Overview of Table Metadata
+
+Here is a typical scenario to explain how table metadata works. Lets say we want
+to create a new datasource. (Although you can use table metadata with an existing
+datasource, this explaination will focus on a new one.) We know our input data is
+not quite what dashboard users expect, so we want to clean things up a bit.
+
+### Design the Datasource
+
+We start with pencil and paper (actually, a text editor or spreadsheet) and we
+list the input columns and their types. Next, we think about the name we want to
+appear for that column in our UI, and the data type we'd prefer. If we are going
+to use Druid's rollup feature, we also want to identify the aggregation we wish
+to use. The result might look like this:
+
+```text
+Input source: web-metrics.csv
+Datasource: webMetrics
+
+    Input Column               Datasource Column
+Name          Type        Name       Type      Aggregation
+----          ----        ----       ----      -----------
+timestamp     ISO string  __time     TIMESTAMP
+host          string      host       VARCHAR
+referer       string
+status        int         httpStatus VARCHAR
+request-size  int         bytesIn    BIGINT    SUM
+response-size int         bytesOut   BIGINT    SUM
+...
+```
+
+Here, we've decided we don't need the referer. For our metrics use, the referer
+might not be helpful, so we just ignore it.
+
+Notice that we use arbitrary types for the CSV input. CSV doesn't really have
+types. We'll need to convert these to SQL types later. For datasource columns,
+we use SQL types, not Druid types. There is a SQL type for each Druid type and
+Druid aggregation. See [Datasource metadata](./datasource.md) for details.
+
+For aggregates, we use a set of special aggregate types described [here](./datasource.md).
+In our case, we use `SUM(BIGINT)` for the aggregate types.
+
+If we are using rollup, we may want to truncate our timestamp. Druid rolls up
+to the millsecond level by default. Perhaps we want to roll up to the second
+level (that is, combine all data for a given host and status that occurs within
+any one second of time.) We do by specifying a parameter to the time data type:
+`TIMESTAMP('PT1S')`.
+
+A Druid datasource needs two additional properties: our time partitioning
+and clustering. [Time partitioning](../multi-stage-query/reference.html#partitioned-by)
+gives the size of the time chunk stored in each segment file. Perhaps our system is of
+moderate size, and we want our segments to hold a day of data, or `P1D`.
+
+We can also choose [clustering](../multi-stage-query/reference.html#clustered-by), which
+is handy as data sizes expand. Suppose we want to cluster on `host`.
+
+We have the option of "sealing" the schema, meaning that we _only_ want these columns
+to be ingested, but no others. We'll do that here. The default option is to be
+unsealed, meaning that ingestion can add new columns at will without the need to first
+declare them in the schema. Use the option that works best for your use case.
+
+We are now ready to define our datasource metadata. We can craft a JSON payload with the
+information (handy if we write a script to do the translation work.) Or, we
+can use the Druid web console. Either way, we create a metadata entry for
+our `webMetrics` datasource that contains the columns and types above.
+
+Here's an example of the JSON payload:
+
+```JSON
+TO DO
+```
+
+If using the REST API, we create the table using the (TODO) API (NEED LINK).
+
+### Define our Input Data
+
+We plan to load new data every day, just after midnight, so our ingestion frequency
+matches our time partitioning. (As our system grows, we will probably ingest more
+frequently, and perhaps shift to streaming ingestion, but let's start simple.)
+We don't want to repeat the same information on each ingest. Instead, we want to
+store that information once and reuse it.
+
+To make things simple, let's assume that Druid runs on only one server, so we can
+load from a local directory. Let's say we always put the data in `/var/metrics/web`
+and we name our directories with the date: `2022-12-02`, etc. Each directory has any
+number of files for that day. Let's further suppose that we know that our inputs are
+always CSV, and always have the columns we listed above.
+
+We model this in Druid by defining an external table that includes everything about
+the input _except_ the set of files (which change each day). Let's call our external
+table `webInput`. All external tables reside in the `ext` schema (namespace).
+
+We again use the (TODO) API, this time with a JSON payload for the eternal table.
+The bulk of the information is simliar to what we use in an MSQ `extern` function:
+the JSON serialized form of an input source and format. But, the input source is
+"partial", we leave off the actual files.
+
+Notice we use SQL types when defining the eternal table. Druid uses this information
+to parse the CSV file coluns into the desired SQL (and Druid) type.
+
+```JSON
+TO DO
+```
+
+### MSQ Ingest
+
+Our next step is to load some data. We'll use MSQ. We've defined our datasource, and
+the input files from which we'll read. We use MSQ to define the mapping from the
+input to the output.
+
+```sql
+INSERT INTO webMetrics
+SELECT
+  TIME_PARSE(timestamp) AS __time,
+  host,
+  CAST(status AS VARCHAR) AS httpStatus,
+  SUM("request-size") AS bytesIn,
+  SUM("response-size") AS bytesOut
+FROM ext.webInput(filePattern => '2202-12-02/*.csv')
+GROUP BY __time, host, status
+```
+
+Some things to notice:
+
+* Druid's `INSERT` statement has a special way to match columns. Unlike standard SQL,
+which matches by position, Druid SQL matches by name. Thus, we must use `AS` clauses
+to ensure our data has the same name as the column in our datasource.
+* We do not include a `TIME_FLOOR` function because Druid inserts that automatically
+based on the `TIMESTAMP('PT5M')` data type.
+* We include a `CAST` to convert `status` (a number) to `httpStatus`
+(as string). Druid will validate the type based on the table metadata.
+* We include the aggregations for the two aggregated fields.
+The aggregations must match those defined in table metadata.
+* We also must explicitly spell out the `GROUP BY` clause which must include all
+non-aggregated columns.
+* Notice that there is no `PARTITIONED BY` or `CLUSTERED BY` clauses: Druid fills those
+in from table metadata.
+
+If we make a mistake, and omit a required column alias, or use the wrong alias, the
+MSQ query will fail. This is part of the data governance idea: Druid just prevented us
+from creating segments that don't satisfy our declared table schema.
+
+### Query and Schema Evolution
+
+We can now query our data. Though it is not obvious in this example, the query experience
+is also guided by table metadata. To see this, suppose that we soon discover that no one
+actually needs the `bytesIn` field: our server doesn't accept `POST` requets and so all
+request are pretty small. To avoid confusing users, we want to remove that column. But,
+we've already ingested data. We could throw away the data and start over. Or, we could
+simply "hide" the unwanted column using the
+[hide columns](api.md) API.
+
+If we now do a `SELECT *` query we find that `bytesIn` is no longer available and we
+no longer have to explain to our users to ignore that column. While hiding a column is
+trivial in this case, imagine if you wanted to make that decision after loading a petabyte
+of data. Hiding a column gives you an immediate solution, even if it might take a long
+time to rewrite (compact) all the existign segments to physicaly remove the column.
+
+We have to change our MSQ ingest statement also to avoid loading any more of the
+now-unused column. This can happen at any time: MSQ ingest won't fail if we try to ingest
+into a hidden column.
+
+In a similar way, if we realize that we actually want `httpStatus` to be `BIGINT`, we
+can change the type, again using the [REST API](api.md).
+
+New ingestions will use the new type. Again, it may be costly to change existing data.
+Drill will, however, automatically `CAST` the old `VARCHAR` (string) data to the
+new `BIGINT` type, allow us to make the change immediately.
+
+## Details
+
+Now that we understand what table metadata is, and how we might use it, we are ready
+to dive into the technical reference material which follows.
+
+## Limitations
+
+This is the first version of the catalog feature. A number of limitations are known:
+
+* When using the catalog to specify column types for a table ingested via MSQ, the
+  resulting types are determined by the `SELECT` list, not the table schema. Druid will
+  ensure that the actual type is compatible with the declared type, the the actual type
+  may still differ from the declared type. Use a `CAST` to ensure the types are as
+  desiried.
+* The present version supports only the S3 cloud data source.
+* Catalog information is used only for MSQ ingestion, but not for classic batch or

Review Comment:
   Good question. This is the intro page. So, it first explains what the feature is and how it can be used. Then it lists limitations. As a compromise, the top section does mention the feature is new, and I added a link there to the Limitations section.



##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##########
@@ -1868,52 +1872,91 @@ private static Pair<List<DimensionSchema>, List<AggregatorFactory>> makeDimensio
 
     // Each column can be of either time, dimension, aggregator. For this method. we can ignore the time column.
     // For non-complex columns, If the aggregator factory of the column is not available, we treat the column as
-    // a dimension. For complex columns, certains hacks are in place.
+    // a dimension. For complex columns, certain hacks are in place.
     for (final String outputColumn : outputColumnsInOrder) {
-      final String queryColumn = columnMappings.getQueryColumnForOutputColumn(outputColumn);
-      final ColumnType type =
-          querySignature.getColumnType(queryColumn)
-                        .orElseThrow(() -> new ISE("No type for column [%s]", outputColumn));
-
-      if (!outputColumn.equals(ColumnHolder.TIME_COLUMN_NAME)) {
-
-        if (!type.is(ValueType.COMPLEX)) {
-          // non complex columns
-          populateDimensionsAndAggregators(
-              dimensions,
-              aggregators,
-              outputColumnAggregatorFactories,
-              outputColumn,
-              type
-          );
-        } else {
-          // complex columns only
-          if (DimensionHandlerUtils.DIMENSION_HANDLER_PROVIDERS.containsKey(type.getComplexTypeName())) {
-            dimensions.add(DimensionSchemaUtils.createDimensionSchema(outputColumn, type));
-          } else if (!isRollupQuery) {
-            aggregators.add(new PassthroughAggregatorFactory(outputColumn, type.getComplexTypeName()));
-          } else {
-            populateDimensionsAndAggregators(
-                dimensions,
-                aggregators,
-                outputColumnAggregatorFactories,
-                outputColumn,
-                type
-            );
-          }
-        }
+      final ColumnType type = computeStorageType(storageTypes, querySignature, columnMappings, outputColumn);
+
+      if (outputColumn.equals(ColumnHolder.TIME_COLUMN_NAME)) {
+        // Skip
+      } else if (!type.is(ValueType.COMPLEX)) {
+        // non complex columns
+        populateDimensionsAndAggregators(
+            dimensions,
+            aggregators,
+            outputColumnAggregatorFactories,
+            outputColumn,
+            type
+        );
+     // complex columns only
+      } else if (DimensionHandlerUtils.DIMENSION_HANDLER_PROVIDERS.containsKey(type.getComplexTypeName())) {
+        dimensions.add(DimensionSchemaUtils.createDimensionSchema(outputColumn, type));
+      } else if (!isRollupQuery) {
+        aggregators.add(new PassthroughAggregatorFactory(outputColumn, type.getComplexTypeName()));
+      } else {
+        populateDimensionsAndAggregators(
+            dimensions,
+            aggregators,
+            outputColumnAggregatorFactories,
+            outputColumn,
+            type
+        );
       }
     }
 
     return Pair.of(dimensions, aggregators);
   }
 
+  // TODO: This is a starter set of basic types.
+  private static final Map<String, ColumnType> COLUMN_TYPES = ImmutableMap.<String, ColumnType>builder()
+        .put(ColumnType.STRING.asTypeString(), ColumnType.STRING)
+        .put(ColumnType.LONG.asTypeString(), ColumnType.LONG)
+        .put(ColumnType.FLOAT.asTypeString(), ColumnType.FLOAT)
+        .put(ColumnType.DOUBLE.asTypeString(), ColumnType.DOUBLE)
+        .build();
+
+  /**
+   * Compute the storage (segment column) type for the given output column. The
+   * storage type is either provided explicitly, or is inferred from the query.
+   * <p>
+   * When the storage type differs from the query output type, the two must be
+   * compatible. That is, the appenderator must be able to convert from the query
+   * type to the storage type. The Calcite validator will have done this check,
+   * so that, by the time we get here, we can assume any required conversions
+   * are supported.
+   */
+  private static ColumnType computeStorageType(
+      final Map<String, String> storageTypes,
+      final RowSignature querySignature,
+      final ColumnMappings columnMappings,
+      final String outputColumn
+  )
+  {
+    // If output types are provided, and if this column has a type, and that type is valid,
+    // then use that type as the output type. The output type comes from a table schema
+    // declaration, such as the Druid catalog.
+    if (storageTypes != null) {
+      String outputType = storageTypes.get(outputColumn);
+      if (outputType != null) {
+        ColumnType type = COLUMN_TYPES.get(StringUtils.toUpperCase(outputType));
+        if (type != null) {
+          return type;
+        }

Review Comment:
   Good question. I was rather tentative here to avoid breaking anything. In particular, we've yet to sort out complex and aggregate types. After we gain confidence in this change, we can tighten up the logic to catch cases where the planner is out-of-sync with MSQ.



##########
docs/catalog/index.md:
##########
@@ -0,0 +1,314 @@
+---
+id: index
+title: Table Metadata Catalog
+sidebar_label: Introduction
+description: Introduces the table metadata features
+---
+
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one
+  ~ or more contributor license agreements.  See the NOTICE file
+  ~ distributed with this work for additional information
+  ~ regarding copyright ownership.  The ASF licenses this file
+  ~ to you under the Apache License, Version 2.0 (the
+  ~ "License"); you may not use this file except in compliance
+  ~ with the License.  You may obtain a copy of the License at
+  ~
+  ~   http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing,
+  ~ software distributed under the License is distributed on an
+  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  ~ KIND, either express or implied.  See the License for the
+  ~ specific language governing permissions and limitations
+  ~ under the License.
+  -->
+
+Druid 26.0 introduces the idea of _metadata_ to describe datasources and external
+tables. External table metadata simplifies MSQ ingestion statements. Datasource
+metadata gives you another data governance tool to manage and communicate the data
+stored in a datasource. The user of table metadata is entirely optional: you can
+use it for no tables, some tables or all tables as your needs evolve.
+
+The table metadata feature consists of the main parts:
+
+* A new table in Druid's catalog (also known as the metadata store),
+* A set of new REST APIs to manage table metadata entries,
+* SQL integratoin to put the metadata to work.
+
+Table metadata is an experimental feature in this release. We encourage you to try
+it and to provide feedback. However, since the feature is experimental, some parts
+may change based on feedback.
+
+In this section, we use the term "table" to refer to both datasources and external
+tables. To SQL, these are both tables, and the table metadata feature treats them
+similarly.
+
+## Contents
+
+- [Introduction](./index.md) - Overview of the table metadata feature.
+- [Metadata API](./api.md) - Reference for the REST API operations to create and modify metadata.
+- [Datasource Metadata](./datasource.md) - Details for datasource metadata and partial tables.
+- [External Tables](./ext.md) - Details about external tables used in MSQ ingestion.
+
+
+## Enable Table Metadata Storage
+
+Table metadata storage resides in an extension which is not loaded by default. Instead,
+you must explicitly "opt in" to using this feature. Table metadata store is required
+for the other features described in this section.
+
+Enable table metadata storage by adding `druid-catalog` to your list of loaded
+extensions in `_common/common.runtime.properties`:
+
+```text
+druid.extensions.loadList=[..., "druid-catalog"]
+```
+
+Enabing this extension causes Druid to create the table metadata table and to
+make the table metadata REST APIs available for use.
+
+## When to Use Table Metadata
+
+People use Druid in many ways. When learning Druid, we often load data just to
+see how Druid works and to experiment. For this, "classic" Druid is ideal: just
+load data and go: there is no need for additional datasource metadata.
+
+As we put Druid to work, we craft an application around Druid: a data pipeline
+that feeds data into Druid, UI (dashboards, applications) which pull data out,
+and deployment scripts to keep the system running. At this point, it can be
+useful to design the table schema with as much care as the rest of the system.
+What columns are needed? What column names will be the most clear to the end
+user? What data type should each column have?
+
+Historically, developers encoded these decisions into Druid ingestion specs.
+With MSQ, that information is encoded into SQL statements. To control changes,
+the specs or SQL statements are versioned in Git, subject to reviews when changes
+occur.
+
+This approach works well when there is a single source of data for each datasource.
+It can become difficult to manage when we want to load multiple, different inputs
+into a single datasource, and we want to create a single, composite schema that
+represents all of them. Think, for example, of a metrics application in which
+data arrives from various metrics feeds such as logs and so on. We want to decide
+upon a common set of columns so that, say HTTP status is always stored as a string
+in a column called "http_status", even if some of the inputs use names such as
+"status", "httpCode", or "statusCode". This is, in fact, the classic [ETL
+(extract, transform and load)](https://en.wikipedia.org/wiki/Extract,_transform,_load)
+design challenge.
+
+In this case, having a known, defined datasource schema becomes essential. The job
+of each of many ingestion specs or MSQ queries is to transform the incoming data from
+the format in which it appears into the form we've designed for our datasource.
+While table metadata is generally useful, it is this many-to-one mapping scenario
+where it finds its greatest need.
+
+### Table Metadata is Optional
+
+Table metadata is a new feature. Existing Druid systems have physical datasource
+metadata, but no logical data. As a result, table metdata can be thought of as a set
+of "hints": optional information that, if present, provides additional features not
+available when only using the existing physical metadata. See the
+[datasource section](./datasource.md) for information on the use of metadata with
+datasources. Without external metadata, you can use the existing `extern()` table
+function to specify an MSQ input source, format and schema.
+
+### Table Metadata is Extensible
+
+This documentation explains the Druid-defined properties available for datasources
+and external tables. The metadata system is extensible: you can define other
+properties, as long as those properties use names distinct from the Druid-defined
+names. For example, you might want to track the source of the data, or might want
+to include display name or display format properties to help your UI display
+columns.
+
+## Overview of Table Metadata
+
+Here is a typical scenario to explain how table metadata works. Lets say we want
+to create a new datasource. (Although you can use table metadata with an existing
+datasource, this explaination will focus on a new one.) We know our input data is
+not quite what dashboard users expect, so we want to clean things up a bit.
+
+### Design the Datasource
+
+We start with pencil and paper (actually, a text editor or spreadsheet) and we
+list the input columns and their types. Next, we think about the name we want to
+appear for that column in our UI, and the data type we'd prefer. If we are going
+to use Druid's rollup feature, we also want to identify the aggregation we wish
+to use. The result might look like this:
+
+```text
+Input source: web-metrics.csv
+Datasource: webMetrics
+
+    Input Column               Datasource Column
+Name          Type        Name       Type      Aggregation
+----          ----        ----       ----      -----------
+timestamp     ISO string  __time     TIMESTAMP
+host          string      host       VARCHAR
+referer       string
+status        int         httpStatus VARCHAR
+request-size  int         bytesIn    BIGINT    SUM
+response-size int         bytesOut   BIGINT    SUM
+...
+```
+
+Here, we've decided we don't need the referer. For our metrics use, the referer
+might not be helpful, so we just ignore it.
+
+Notice that we use arbitrary types for the CSV input. CSV doesn't really have
+types. We'll need to convert these to SQL types later. For datasource columns,
+we use SQL types, not Druid types. There is a SQL type for each Druid type and
+Druid aggregation. See [Datasource metadata](./datasource.md) for details.
+
+For aggregates, we use a set of special aggregate types described [here](./datasource.md).
+In our case, we use `SUM(BIGINT)` for the aggregate types.
+
+If we are using rollup, we may want to truncate our timestamp. Druid rolls up
+to the millsecond level by default. Perhaps we want to roll up to the second
+level (that is, combine all data for a given host and status that occurs within
+any one second of time.) We do by specifying a parameter to the time data type:
+`TIMESTAMP('PT1S')`.
+
+A Druid datasource needs two additional properties: our time partitioning
+and clustering. [Time partitioning](../multi-stage-query/reference.html#partitioned-by)
+gives the size of the time chunk stored in each segment file. Perhaps our system is of
+moderate size, and we want our segments to hold a day of data, or `P1D`.
+
+We can also choose [clustering](../multi-stage-query/reference.html#clustered-by), which
+is handy as data sizes expand. Suppose we want to cluster on `host`.
+
+We have the option of "sealing" the schema, meaning that we _only_ want these columns
+to be ingested, but no others. We'll do that here. The default option is to be
+unsealed, meaning that ingestion can add new columns at will without the need to first
+declare them in the schema. Use the option that works best for your use case.
+
+We are now ready to define our datasource metadata. We can craft a JSON payload with the
+information (handy if we write a script to do the translation work.) Or, we
+can use the Druid web console. Either way, we create a metadata entry for
+our `webMetrics` datasource that contains the columns and types above.
+
+Here's an example of the JSON payload:
+
+```JSON
+TO DO
+```
+
+If using the REST API, we create the table using the (TODO) API (NEED LINK).
+
+### Define our Input Data
+
+We plan to load new data every day, just after midnight, so our ingestion frequency
+matches our time partitioning. (As our system grows, we will probably ingest more
+frequently, and perhaps shift to streaming ingestion, but let's start simple.)
+We don't want to repeat the same information on each ingest. Instead, we want to
+store that information once and reuse it.
+
+To make things simple, let's assume that Druid runs on only one server, so we can
+load from a local directory. Let's say we always put the data in `/var/metrics/web`
+and we name our directories with the date: `2022-12-02`, etc. Each directory has any
+number of files for that day. Let's further suppose that we know that our inputs are
+always CSV, and always have the columns we listed above.
+
+We model this in Druid by defining an external table that includes everything about
+the input _except_ the set of files (which change each day). Let's call our external
+table `webInput`. All external tables reside in the `ext` schema (namespace).
+
+We again use the (TODO) API, this time with a JSON payload for the eternal table.
+The bulk of the information is simliar to what we use in an MSQ `extern` function:
+the JSON serialized form of an input source and format. But, the input source is
+"partial", we leave off the actual files.
+
+Notice we use SQL types when defining the eternal table. Druid uses this information
+to parse the CSV file coluns into the desired SQL (and Druid) type.
+
+```JSON
+TO DO
+```
+
+### MSQ Ingest
+
+Our next step is to load some data. We'll use MSQ. We've defined our datasource, and
+the input files from which we'll read. We use MSQ to define the mapping from the
+input to the output.
+
+```sql
+INSERT INTO webMetrics
+SELECT
+  TIME_PARSE(timestamp) AS __time,
+  host,
+  CAST(status AS VARCHAR) AS httpStatus,
+  SUM("request-size") AS bytesIn,
+  SUM("response-size") AS bytesOut
+FROM ext.webInput(filePattern => '2202-12-02/*.csv')
+GROUP BY __time, host, status
+```
+
+Some things to notice:
+
+* Druid's `INSERT` statement has a special way to match columns. Unlike standard SQL,
+which matches by position, Druid SQL matches by name. Thus, we must use `AS` clauses
+to ensure our data has the same name as the column in our datasource.
+* We do not include a `TIME_FLOOR` function because Druid inserts that automatically
+based on the `TIMESTAMP('PT5M')` data type.
+* We include a `CAST` to convert `status` (a number) to `httpStatus`
+(as string). Druid will validate the type based on the table metadata.
+* We include the aggregations for the two aggregated fields.
+The aggregations must match those defined in table metadata.
+* We also must explicitly spell out the `GROUP BY` clause which must include all
+non-aggregated columns.
+* Notice that there is no `PARTITIONED BY` or `CLUSTERED BY` clauses: Druid fills those
+in from table metadata.
+
+If we make a mistake, and omit a required column alias, or use the wrong alias, the
+MSQ query will fail. This is part of the data governance idea: Druid just prevented us
+from creating segments that don't satisfy our declared table schema.
+
+### Query and Schema Evolution
+
+We can now query our data. Though it is not obvious in this example, the query experience
+is also guided by table metadata. To see this, suppose that we soon discover that no one
+actually needs the `bytesIn` field: our server doesn't accept `POST` requets and so all
+request are pretty small. To avoid confusing users, we want to remove that column. But,
+we've already ingested data. We could throw away the data and start over. Or, we could
+simply "hide" the unwanted column using the
+[hide columns](api.md) API.
+
+If we now do a `SELECT *` query we find that `bytesIn` is no longer available and we
+no longer have to explain to our users to ignore that column. While hiding a column is
+trivial in this case, imagine if you wanted to make that decision after loading a petabyte
+of data. Hiding a column gives you an immediate solution, even if it might take a long
+time to rewrite (compact) all the existign segments to physicaly remove the column.
+
+We have to change our MSQ ingest statement also to avoid loading any more of the
+now-unused column. This can happen at any time: MSQ ingest won't fail if we try to ingest
+into a hidden column.
+
+In a similar way, if we realize that we actually want `httpStatus` to be `BIGINT`, we
+can change the type, again using the [REST API](api.md).
+
+New ingestions will use the new type. Again, it may be costly to change existing data.
+Drill will, however, automatically `CAST` the old `VARCHAR` (string) data to the

Review Comment:
   Sigh... Yes, it should. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org