You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "techdocsmith (via GitHub)" <gi...@apache.org> on 2023/02/02 02:53:37 UTC

[GitHub] [druid] techdocsmith commented on a diff in pull request #13465: docs: notebook only for SQL tutorial

techdocsmith commented on code in PR #13465:
URL: https://github.com/apache/druid/pull/13465#discussion_r1093946519


##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,774 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n",
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid SQL with the primary focus on the SELECT statement.\n",
+    "To learn about native queries, see [Native queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file of the [apache/druid repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- The [`requests` package](https://requests.readthedocs.io/en/latest/user/install/) for Python\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or [Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a non-default port. Druid and Jupyter both default to port `8888`, so you need to start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the automatic single-machine configuration described in the [Druid quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no authentication or authorization is required unless explicitly mentioned. If you haven’t already, download Druid version 25.0 or higher and start Druid services as described in the quickstart.\n",
+    "\n",
+    "To start the tutorial, run the following cell. It imports the required Python packages and defines a variable for the Druid host, where the Router service listens."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, you can point to other Druid services.\n",
+    "# In this tutorial, you'll use the Router service as the `druid_host`.\n",
+    "druid_host = \"http://localhost:8888\"\n",
+    "dataSourceName = \"wikipedia-sql-tutorial\"\n",
+    "bold = '\\033[1m'\n",
+    "standard = '\\033[0m'\n",
+    "print(f\"{bold}Druid host{standard}: {druid_host}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
+   "metadata": {},
+   "source": [
+    "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload` variables are updated to accomplish different tasks.\n",
+    "\n",
+    "## Druid SQL statements\n",
+    "\n",
+    "The following are the main Druid SQL statements:\n",
+    "\n",
+    "* SELECT: extract data from a datasource\n",
+    "* INSERT INTO: create a new datasource or append to an existing datasource\n",
+    "* REPLACE INTO: create a new datasource or overwrite data in an existing datasource\n",
+    "\n",
+    "Druid SQL does not support CREATE TABLE, DELETE, and DROP TABLE statements.\n",
+    "\n",
+    "## Ingest data\n",
+    "\n",
+    "You can use either INSERT INTO or REPLACE INTO to ingest data into a new datasource.\n",

Review Comment:
   ```suggestion
       "You can use either INSERT INTO or REPLACE INTO to create a datasource and ingest data.\n",
   ```
   Emphasize that these statements create the data source if it doesn't exist



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,774 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n",
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid SQL with the primary focus on the SELECT statement.\n",
+    "To learn about native queries, see [Native queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file of the [apache/druid repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- The [`requests` package](https://requests.readthedocs.io/en/latest/user/install/) for Python\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or [Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a non-default port. Druid and Jupyter both default to port `8888`, so you need to start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the automatic single-machine configuration described in the [Druid quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no authentication or authorization is required unless explicitly mentioned. If you haven’t already, download Druid version 25.0 or higher and start Druid services as described in the quickstart.\n",
+    "\n",
+    "To start the tutorial, run the following cell. It imports the required Python packages and defines a variable for the Druid host, where the Router service listens."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, you can point to other Druid services.\n",
+    "# In this tutorial, you'll use the Router service as the `druid_host`.\n",
+    "druid_host = \"http://localhost:8888\"\n",
+    "dataSourceName = \"wikipedia-sql-tutorial\"\n",

Review Comment:
   ```suggestion
       "dataSourceName = \"wikipedia-sql-tutorial\"\n",
       "\n",
       "# Set some basic output formatting.
   ```
   Nice to let the user know why we're doing this.



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,774 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",

Review Comment:
   ```suggestion
       "# Learn the basics of Druid SQL\n",
   ```
   We are retiring the convention of "Tutorial " in the title.



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,774 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n",
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid SQL with the primary focus on the SELECT statement.\n",
+    "To learn about native queries, see [Native queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file of the [apache/druid repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- The [`requests` package](https://requests.readthedocs.io/en/latest/user/install/) for Python\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or [Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a non-default port. Druid and Jupyter both default to port `8888`, so you need to start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the automatic single-machine configuration described in the [Druid quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no authentication or authorization is required unless explicitly mentioned. If you haven’t already, download Druid version 25.0 or higher and start Druid services as described in the quickstart.\n",
+    "\n",
+    "To start the tutorial, run the following cell. It imports the required Python packages and defines a variable for the Druid host, where the Router service listens."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, you can point to other Druid services.\n",
+    "# In this tutorial, you'll use the Router service as the `druid_host`.\n",
+    "druid_host = \"http://localhost:8888\"\n",
+    "dataSourceName = \"wikipedia-sql-tutorial\"\n",
+    "bold = '\\033[1m'\n",
+    "standard = '\\033[0m'\n",
+    "print(f\"{bold}Druid host{standard}: {druid_host}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
+   "metadata": {},
+   "source": [
+    "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload` variables are updated to accomplish different tasks.\n",
+    "\n",
+    "## Druid SQL statements\n",
+    "\n",
+    "The following are the main Druid SQL statements:\n",
+    "\n",
+    "* SELECT: extract data from a datasource\n",
+    "* INSERT INTO: create a new datasource or append to an existing datasource\n",
+    "* REPLACE INTO: create a new datasource or overwrite data in an existing datasource\n",
+    "\n",
+    "Druid SQL does not support CREATE TABLE, DELETE, and DROP TABLE statements.\n",
+    "\n",
+    "## Ingest data\n",
+    "\n",
+    "You can use either INSERT INTO or REPLACE INTO to ingest data into a new datasource.\n",
+    "INSERT INTO and REPLACE INTO statements both require the PARTITIONED BY clause which defines the granularity of time-based partitioning. For more information, see [Partitioning by time](https://druid.apache.org/docs/latest/multi-stage-query/concepts.html#partitioning-by-time).\n",
+    "\n",
+    "Run the following cell to ingest data from an external source into a table named `wikipedia-sql-tutorial` using the [multi-stage query (MSQ) task engine](https://druid.apache.org/docs/latest/multi-stage-query/index.html). The payload is included in the code block directly. If you have an existing datasource with the name `wikipedia-sql-tutorial`, use REPLACE INTO instead of INSERT INTO."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "045f782c-74d8-4447-9487-529071812b51",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql/task\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "\"query\": \"INSERT INTO \\\"wikipedia-sql-tutorial\\\" SELECT TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "          AS __time, * FROM TABLE \\\n",
+    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": [\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": \\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\
 "name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "          PARTITIONED BY DAY\",\n",
+    "  \"context\": {\n",
+    "    \"maxNumTasks\": 3\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "ingestion_taskId_response = response\n",
+    "ingestion_taskId = json.loads(ingestion_taskId_response.text)['taskId']\n",
+    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
+    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
+    "print(\"\\nThe response includes the task ID and the status: \" + response.text + \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ceb86ce0-85f6-4c63-8fd6-883033ee96e9",
+   "metadata": {},
+   "source": [
+    "Wait for ingestion to complete before proceeding.\n",
+    "To check on the status of your ingestion task, run the following cell.\n",
+    "It continuously fetches the status of the ingestion job until the ingestion job is complete."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "df12d12c-a067-4759-bae0-0410c24b6205",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "endpoint = f\"/druid/indexer/v1/task/{ingestion_taskId}/status\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"GET\"\n",
+    "\n",
+    "payload = {}\n",
+    "headers = {}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "ingestion_status = json.loads(response.text)['status']['status']\n",
+    "# If you only want to fetch the status once and print it, \n",
+    "# uncomment the print statement and comment out the if and while loops\n",
+    "# print(json.dumps(response.json(), indent=4))\n",
+    "\n",
+    "if ingestion_status == \"RUNNING\":\n",
+    "  print(\"The ingestion is running...\")\n",
+    "\n",
+    "while ingestion_status != \"SUCCESS\":\n",
+    "  response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "  ingestion_status = json.loads(response.text)['status']['status']\n",
+    "  time.sleep(15)  \n",
+    "  \n",
+    "if ingestion_status == \"SUCCESS\": \n",
+    "  print(\"The ingestion is complete:\")\n",
+    "  print(json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "240b0ad5-48f2-4737-b12b-5fd5f98da300",
+   "metadata": {},
+   "source": [
+    "## Datasources\n",
+    "\n",
+    "Druid supports a variety of datasources, with the table datasource being the most common. In Druid documentation, the word \"datasource\" often implicitly refers to the table datasource.\n",
+    "The [Datasources](https://druid.apache.org/docs/latest/querying/datasource.html) topic provides a comprehensive overview of datasources supported by Druid SQL.\n",
+    "\n",
+    "In Druid SQL, table datasources reside in the `druid` schema. This is the default schema, so table datasources can be referenced as either `druid.dataSourceName` or `dataSourceName`.\n",
+    "\n",
+    "For example, run the next cell to return the rows of the column named `channel` from the `wikipedia-sql-tutorial` table. Because this tutorial is running in Jupyter, the cells use the LIMIT clause to limit the size of the query results for display purposes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "91dd255a-4d55-493e-a067-4cef5c659657",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT \\\"channel\\\" FROM \\\"wikipedia-sql-tutorial\\\" LIMIT 7\"})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "\n",
+    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
+    "print(f\"\\nEach JSON object in the response represents a row in the {dataSourceName} datasource.\") \n",
+    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbeb5a63",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Data types\n",
+    "\n",
+    "Druid maps SQL data types onto native types at query runtime.\n",
+    "The following native types are supported for Druid columns:\n",
+    "\n",
+    "* STRING: UTF-8 encoded strings and string arrays\n",
+    "* LONG: 64-bit signed int\n",
+    "* FLOAT: 32-bit float\n",
+    "* DOUBLE: 64-bit float\n",
+    "* COMPLEX: represents non-standard data types, such as nested JSON, hyperUnique and approxHistogram aggregators, and DataSketches aggregators\n",
+    "\n",
+    "Druid exposes table and column metadata through [INFORMATION_SCHEMA](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#information-schema) tables. Run the following query to retrieve metadata for the `wikipedia-sql-tutorial` datasource. In the response body, each JSON object correlates to a column in the table.\n",
+    "Check the objects' `DATA_TYPE` property for SQL data types. You should see TIMESTAMP, BIGINT, and VARCHAR SQL data types. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b9227d6c-1d8c-4169-b13b-a08625c4011f",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE \\\"TABLE_SCHEMA\\\" = 'druid' AND \\\"TABLE_NAME\\\" = 'wikipedia-sql-tutorial' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "\n",
+    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
+    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c59ca797-dd91-442b-8d02-67b711b3fcc6",
+   "metadata": {},
+   "source": [
+    "Druid natively interprets VARCHAR as STRING and BIGINT and TIMESTAMP SQL data types as LONG. For reference on how SQL data types map onto Druid native types, see [Standard types](https://druid.apache.org/docs/latest/querying/sql-data-types.html#standard-types).\n",
+    "\n",
+    "### Timestamp values\n",
+    "\n",
+    "Druid stores timestamp values as the number of milliseconds since the Unix epoch.\n",
+    "Primary timestamps are stored in a column named `__time`.\n",
+    "If a dataset doesn't have a timestamp, Druid uses the default value of `1970-01-01 00:00:00`.\n",
+    "\n",
+    "Druid time functions perform best when used with the `__time` column.\n",
+    "By default, time functions use the UTC time zone.\n",
+    "For more information about timestamp handling, see [Date and time functions](https://druid.apache.org/docs/latest/querying/sql-scalar.html#date-and-time-functions).\n",
+    "\n",
+    "Run the following cell to see a time function at work. This example uses the `TIME_IN_INTERVAL` function to query the `channel` and `page` columns of the `wikipedia-sql-tutorial` for rows whose timestamp is contained within the specified interval. The cell groups the results by columns."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7e3d62a-1325-4992-8bcd-c0f1925704bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT channel, page\\nFROM \\\"wikipedia-sql-tutorial\\\" WHERE TIME_IN_INTERVAL(__time, '2016-06-27T00:05:54.56/2016-06-27T00:06:53')\\nGROUP BY channel, page\\nLIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "\n",
+    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
+    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7cfdfae-ccba-49ba-a70f-63d0bd3527b2",
+   "metadata": {},
+   "source": [
+    "### NULL values\n",
+    "\n",
+    "Druid supports SQL compatible NULL handling, allowing string columns to distinguish empty strings from NULL and numeric columns to contain NULL rows. To store and query data in SQL compatible mode, explicitly set the `useDefaultValueForNull` property to `false` in `_common/common.runtime.properties`. See [Configuration reference](https://druid.apache.org/docs/latest/configuration/index.html) for common configuration properties.\n",
+    "\n",
+    "When `useDefaultValueForNull` is set to `true` (default behavior), Druid stores NULL values as `0` for numeric columns and as `''` for string columns."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29c24856",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## SELECT statement syntax\n",
+    "\n",
+    "Druid SQL supports SELECT statements with the following structure:\n",
+    "\n",
+    "``` mysql\n",
+    "[ EXPLAIN PLAN FOR ]\n",
+    "[ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]\n",
+    "SELECT [ ALL | DISTINCT ] { * | exprs }\n",
+    "FROM { <table> | (<subquery>) | <o1> [ INNER | LEFT ] JOIN <o2> ON condition }\n",

Review Comment:
   Not necessarily for this pr, but Druid also supports CROSS join:
   https://github.com/apache/druid/pull/13404
   This is not currently sorted out so don't want it to hold up this pr
   



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,774 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n",
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid SQL with the primary focus on the SELECT statement.\n",
+    "To learn about native queries, see [Native queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file of the [apache/druid repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- The [`requests` package](https://requests.readthedocs.io/en/latest/user/install/) for Python\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or [Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a non-default port. Druid and Jupyter both default to port `8888`, so you need to start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the automatic single-machine configuration described in the [Druid quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no authentication or authorization is required unless explicitly mentioned. If you haven’t already, download Druid version 25.0 or higher and start Druid services as described in the quickstart.\n",
+    "\n",
+    "To start the tutorial, run the following cell. It imports the required Python packages and defines a variable for the Druid host, where the Router service listens."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, you can point to other Druid services.\n",
+    "# In this tutorial, you'll use the Router service as the `druid_host`.\n",
+    "druid_host = \"http://localhost:8888\"\n",
+    "dataSourceName = \"wikipedia-sql-tutorial\"\n",
+    "bold = '\\033[1m'\n",
+    "standard = '\\033[0m'\n",
+    "print(f\"{bold}Druid host{standard}: {druid_host}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
+   "metadata": {},
+   "source": [
+    "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload` variables are updated to accomplish different tasks.\n",
+    "\n",
+    "## Druid SQL statements\n",
+    "\n",
+    "The following are the main Druid SQL statements:\n",
+    "\n",
+    "* SELECT: extract data from a datasource\n",
+    "* INSERT INTO: create a new datasource or append to an existing datasource\n",
+    "* REPLACE INTO: create a new datasource or overwrite data in an existing datasource\n",
+    "\n",
+    "Druid SQL does not support CREATE TABLE, DELETE, and DROP TABLE statements.\n",
+    "\n",
+    "## Ingest data\n",
+    "\n",
+    "You can use either INSERT INTO or REPLACE INTO to ingest data into a new datasource.\n",
+    "INSERT INTO and REPLACE INTO statements both require the PARTITIONED BY clause which defines the granularity of time-based partitioning. For more information, see [Partitioning by time](https://druid.apache.org/docs/latest/multi-stage-query/concepts.html#partitioning-by-time).\n",
+    "\n",
+    "Run the following cell to ingest data from an external source into a table named `wikipedia-sql-tutorial` using the [multi-stage query (MSQ) task engine](https://druid.apache.org/docs/latest/multi-stage-query/index.html). The payload is included in the code block directly. If you have an existing datasource with the name `wikipedia-sql-tutorial`, use REPLACE INTO instead of INSERT INTO."

Review Comment:
   I think we need to point out a couple things about the query. 1. the time parsing maybe just to say we have a Iso 8601 time column  so wer're parsing it. * says to ingest all the columns in the file. . Then I would mention that the EXTERN statement lets us define the data source type and the input schema. And link out for more information. We don't cover EXTERN anywhere else in this tutorial so want to give folks some way to learn if they can.



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,774 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n",
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid SQL with the primary focus on the SELECT statement.\n",
+    "To learn about native queries, see [Native queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file of the [apache/druid repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- The [`requests` package](https://requests.readthedocs.io/en/latest/user/install/) for Python\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or [Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a non-default port. Druid and Jupyter both default to port `8888`, so you need to start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the automatic single-machine configuration described in the [Druid quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no authentication or authorization is required unless explicitly mentioned. If you haven’t already, download Druid version 25.0 or higher and start Druid services as described in the quickstart.\n",
+    "\n",
+    "To start the tutorial, run the following cell. It imports the required Python packages and defines a variable for the Druid host, where the Router service listens."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, you can point to other Druid services.\n",
+    "# In this tutorial, you'll use the Router service as the `druid_host`.\n",
+    "druid_host = \"http://localhost:8888\"\n",
+    "dataSourceName = \"wikipedia-sql-tutorial\"\n",
+    "bold = '\\033[1m'\n",
+    "standard = '\\033[0m'\n",
+    "print(f\"{bold}Druid host{standard}: {druid_host}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
+   "metadata": {},
+   "source": [
+    "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload` variables are updated to accomplish different tasks.\n",
+    "\n",
+    "## Druid SQL statements\n",
+    "\n",
+    "The following are the main Druid SQL statements:\n",
+    "\n",
+    "* SELECT: extract data from a datasource\n",
+    "* INSERT INTO: create a new datasource or append to an existing datasource\n",
+    "* REPLACE INTO: create a new datasource or overwrite data in an existing datasource\n",
+    "\n",
+    "Druid SQL does not support CREATE TABLE, DELETE, and DROP TABLE statements.\n",
+    "\n",
+    "## Ingest data\n",
+    "\n",
+    "You can use either INSERT INTO or REPLACE INTO to ingest data into a new datasource.\n",
+    "INSERT INTO and REPLACE INTO statements both require the PARTITIONED BY clause which defines the granularity of time-based partitioning. For more information, see [Partitioning by time](https://druid.apache.org/docs/latest/multi-stage-query/concepts.html#partitioning-by-time).\n",
+    "\n",
+    "Run the following cell to ingest data from an external source into a table named `wikipedia-sql-tutorial` using the [multi-stage query (MSQ) task engine](https://druid.apache.org/docs/latest/multi-stage-query/index.html). The payload is included in the code block directly. If you have an existing datasource with the name `wikipedia-sql-tutorial`, use REPLACE INTO instead of INSERT INTO."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "045f782c-74d8-4447-9487-529071812b51",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql/task\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "\"query\": \"INSERT INTO \\\"wikipedia-sql-tutorial\\\" SELECT TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "          AS __time, * FROM TABLE \\\n",
+    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": [\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": \\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\
 "name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "          PARTITIONED BY DAY\",\n",
+    "  \"context\": {\n",
+    "    \"maxNumTasks\": 3\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "ingestion_taskId_response = response\n",
+    "ingestion_taskId = json.loads(ingestion_taskId_response.text)['taskId']\n",
+    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
+    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
+    "print(\"\\nThe response includes the task ID and the status: \" + response.text + \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ceb86ce0-85f6-4c63-8fd6-883033ee96e9",
+   "metadata": {},
+   "source": [
+    "Wait for ingestion to complete before proceeding.\n",
+    "To check on the status of your ingestion task, run the following cell.\n",
+    "It continuously fetches the status of the ingestion job until the ingestion job is complete."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "df12d12c-a067-4759-bae0-0410c24b6205",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "endpoint = f\"/druid/indexer/v1/task/{ingestion_taskId}/status\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"GET\"\n",
+    "\n",
+    "payload = {}\n",
+    "headers = {}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "ingestion_status = json.loads(response.text)['status']['status']\n",
+    "# If you only want to fetch the status once and print it, \n",
+    "# uncomment the print statement and comment out the if and while loops\n",
+    "# print(json.dumps(response.json(), indent=4))\n",
+    "\n",
+    "if ingestion_status == \"RUNNING\":\n",
+    "  print(\"The ingestion is running...\")\n",
+    "\n",
+    "while ingestion_status != \"SUCCESS\":\n",
+    "  response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "  ingestion_status = json.loads(response.text)['status']['status']\n",
+    "  time.sleep(15)  \n",
+    "  \n",
+    "if ingestion_status == \"SUCCESS\": \n",
+    "  print(\"The ingestion is complete:\")\n",
+    "  print(json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "240b0ad5-48f2-4737-b12b-5fd5f98da300",
+   "metadata": {},
+   "source": [
+    "## Datasources\n",
+    "\n",
+    "Druid supports a variety of datasources, with the table datasource being the most common. In Druid documentation, the word \"datasource\" often implicitly refers to the table datasource.\n",
+    "The [Datasources](https://druid.apache.org/docs/latest/querying/datasource.html) topic provides a comprehensive overview of datasources supported by Druid SQL.\n",
+    "\n",
+    "In Druid SQL, table datasources reside in the `druid` schema. This is the default schema, so table datasources can be referenced as either `druid.dataSourceName` or `dataSourceName`.\n",
+    "\n",
+    "For example, run the next cell to return the rows of the column named `channel` from the `wikipedia-sql-tutorial` table. Because this tutorial is running in Jupyter, the cells use the LIMIT clause to limit the size of the query results for display purposes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "91dd255a-4d55-493e-a067-4cef5c659657",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT \\\"channel\\\" FROM \\\"wikipedia-sql-tutorial\\\" LIMIT 7\"})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "\n",
+    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
+    "print(f\"\\nEach JSON object in the response represents a row in the {dataSourceName} datasource.\") \n",
+    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbeb5a63",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Data types\n",
+    "\n",
+    "Druid maps SQL data types onto native types at query runtime.\n",
+    "The following native types are supported for Druid columns:\n",
+    "\n",
+    "* STRING: UTF-8 encoded strings and string arrays\n",
+    "* LONG: 64-bit signed int\n",
+    "* FLOAT: 32-bit float\n",
+    "* DOUBLE: 64-bit float\n",
+    "* COMPLEX: represents non-standard data types, such as nested JSON, hyperUnique and approxHistogram aggregators, and DataSketches aggregators\n",
+    "\n",
+    "Druid exposes table and column metadata through [INFORMATION_SCHEMA](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#information-schema) tables. Run the following query to retrieve metadata for the `wikipedia-sql-tutorial` datasource. In the response body, each JSON object correlates to a column in the table.\n",
+    "Check the objects' `DATA_TYPE` property for SQL data types. You should see TIMESTAMP, BIGINT, and VARCHAR SQL data types. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b9227d6c-1d8c-4169-b13b-a08625c4011f",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE \\\"TABLE_SCHEMA\\\" = 'druid' AND \\\"TABLE_NAME\\\" = 'wikipedia-sql-tutorial' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "\n",
+    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
+    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c59ca797-dd91-442b-8d02-67b711b3fcc6",
+   "metadata": {},
+   "source": [
+    "Druid natively interprets VARCHAR as STRING and BIGINT and TIMESTAMP SQL data types as LONG. For reference on how SQL data types map onto Druid native types, see [Standard types](https://druid.apache.org/docs/latest/querying/sql-data-types.html#standard-types).\n",
+    "\n",
+    "### Timestamp values\n",
+    "\n",
+    "Druid stores timestamp values as the number of milliseconds since the Unix epoch.\n",
+    "Primary timestamps are stored in a column named `__time`.\n",
+    "If a dataset doesn't have a timestamp, Druid uses the default value of `1970-01-01 00:00:00`.\n",
+    "\n",
+    "Druid time functions perform best when used with the `__time` column.\n",
+    "By default, time functions use the UTC time zone.\n",
+    "For more information about timestamp handling, see [Date and time functions](https://druid.apache.org/docs/latest/querying/sql-scalar.html#date-and-time-functions).\n",
+    "\n",
+    "Run the following cell to see a time function at work. This example uses the `TIME_IN_INTERVAL` function to query the `channel` and `page` columns of the `wikipedia-sql-tutorial` for rows whose timestamp is contained within the specified interval. The cell groups the results by columns."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7e3d62a-1325-4992-8bcd-c0f1925704bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT channel, page\\nFROM \\\"wikipedia-sql-tutorial\\\" WHERE TIME_IN_INTERVAL(__time, '2016-06-27T00:05:54.56/2016-06-27T00:06:53')\\nGROUP BY channel, page\\nLIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "\n",
+    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
+    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7cfdfae-ccba-49ba-a70f-63d0bd3527b2",
+   "metadata": {},
+   "source": [
+    "### NULL values\n",
+    "\n",
+    "Druid supports SQL compatible NULL handling, allowing string columns to distinguish empty strings from NULL and numeric columns to contain NULL rows. To store and query data in SQL compatible mode, explicitly set the `useDefaultValueForNull` property to `false` in `_common/common.runtime.properties`. See [Configuration reference](https://druid.apache.org/docs/latest/configuration/index.html) for common configuration properties.\n",
+    "\n",
+    "When `useDefaultValueForNull` is set to `true` (default behavior), Druid stores NULL values as `0` for numeric columns and as `''` for string columns."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29c24856",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## SELECT statement syntax\n",
+    "\n",
+    "Druid SQL supports SELECT statements with the following structure:\n",
+    "\n",
+    "``` mysql\n",
+    "[ EXPLAIN PLAN FOR ]\n",
+    "[ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]\n",
+    "SELECT [ ALL | DISTINCT ] { * | exprs }\n",
+    "FROM { <table> | (<subquery>) | <o1> [ INNER | LEFT ] JOIN <o2> ON condition }\n",
+    "[ WHERE expr ]\n",
+    "[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ]\n",
+    "[ HAVING expr ]\n",
+    "[ ORDER BY expr [ ASC | DESC ], expr [ ASC | DESC ], ... ]\n",
+    "[ LIMIT limit ]\n",
+    "[ OFFSET offset ]\n",
+    "[ UNION ALL <another query> ]\n",
+    "```\n",
+    "\n",
+    "As a general rule, use the LIMIT clause with `SELECT *` to limit the number of rows returned. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cf212e5-fb3f-4206-acdd-46ef1da327ab",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## WHERE clause\n",
+    "\n",
+    "Druid SQL uses the [SQL WHERE clause](https://druid.apache.org/docs/latest/querying/sql.html#where) of a SELECT statement to fetch data based on a particular condition.\n",
+    "\n",
+    "In most cases, filtering your results by time using the WHERE clause improves query performance.\n",
+    "This is because Druid partitions data into time chunks and having a time range allows Druid to skip over unrelated data.\n",
+    "You can further partition segments within a time chunk using the CLUSTERED BY clause.\n",

Review Comment:
   ```suggestion
       "At ingestion time, you can further partition segments within a time chunk using the CLUSTERED BY clause.\n",
   ```



##########
examples/quickstart/jupyter-notebooks/sql-tutorial.ipynb:
##########
@@ -0,0 +1,774 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ad4e60b6",
+   "metadata": {
+    "deletable": true,
+    "editable": true,
+    "tags": []
+   },
+   "source": [
+    "# Tutorial: Learn the basics of Druid SQL\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n",
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "  \n",
+    "Apache Druid supports two query languages: Druid SQL and native queries.\n",
+    "Druid SQL is a Structured Query Language (SQL) dialect that enables you to query datasources in Apache Druid using SQL statements.\n",
+    "SQL and Druid SQL use similar syntax, with some notable differences.\n",
+    "Not all SQL functions are supported in Druid SQL. Instead, Druid includes Druid-specific SQL functions for optimized query performance.\n",
+    "\n",
+    "This interactive tutorial introduces you to the unique aspects of Druid SQL with the primary focus on the SELECT statement.\n",
+    "To learn about native queries, see [Native queries](https://druid.apache.org/docs/latest/querying/querying.html)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d6bbbcb",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Make sure that you meet the requirements outlined in the README.md file of the [apache/druid repo](https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "Specifically, you need the following:\n",
+    "- Knowledge of SQL\n",
+    "- [Python3](https://www.python.org/downloads/)\n",
+    "- The [`requests` package](https://requests.readthedocs.io/en/latest/user/install/) for Python\n",
+    "- [JupyterLab](https://jupyter.org/install#jupyterlab) (recommended) or [Jupyter Notebook](https://jupyter.org/install#jupyter-notebook) running on a non-default port. Druid and Jupyter both default to port `8888`, so you need to start Jupyter on a different port. \n",
+    "- An available Druid instance. This tutorial uses the automatic single-machine configuration described in the [Druid quickstart](https://druid.apache.org/docs/latest/tutorials/index.html), so no authentication or authorization is required unless explicitly mentioned. If you haven’t already, download Druid version 25.0 or higher and start Druid services as described in the quickstart.\n",
+    "\n",
+    "To start the tutorial, run the following cell. It imports the required Python packages and defines a variable for the Druid host, where the Router service listens."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b7f08a52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import json\n",
+    "\n",
+    "# druid_host is the hostname and port for your Druid deployment. \n",
+    "# In a distributed environment, you can point to other Druid services.\n",
+    "# In this tutorial, you'll use the Router service as the `druid_host`.\n",
+    "druid_host = \"http://localhost:8888\"\n",
+    "dataSourceName = \"wikipedia-sql-tutorial\"\n",
+    "bold = '\\033[1m'\n",
+    "standard = '\\033[0m'\n",
+    "print(f\"{bold}Druid host{standard}: {druid_host}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e893ef7d-7136-442f-8bd9-31b5a5276518",
+   "metadata": {},
+   "source": [
+    "In the rest of the tutorial, the `endpoint`, `http_method`, and `payload` variables are updated to accomplish different tasks.\n",
+    "\n",
+    "## Druid SQL statements\n",
+    "\n",
+    "The following are the main Druid SQL statements:\n",
+    "\n",
+    "* SELECT: extract data from a datasource\n",
+    "* INSERT INTO: create a new datasource or append to an existing datasource\n",
+    "* REPLACE INTO: create a new datasource or overwrite data in an existing datasource\n",
+    "\n",
+    "Druid SQL does not support CREATE TABLE, DELETE, and DROP TABLE statements.\n",
+    "\n",
+    "## Ingest data\n",
+    "\n",
+    "You can use either INSERT INTO or REPLACE INTO to ingest data into a new datasource.\n",
+    "INSERT INTO and REPLACE INTO statements both require the PARTITIONED BY clause which defines the granularity of time-based partitioning. For more information, see [Partitioning by time](https://druid.apache.org/docs/latest/multi-stage-query/concepts.html#partitioning-by-time).\n",
+    "\n",
+    "Run the following cell to ingest data from an external source into a table named `wikipedia-sql-tutorial` using the [multi-stage query (MSQ) task engine](https://druid.apache.org/docs/latest/multi-stage-query/index.html). The payload is included in the code block directly. If you have an existing datasource with the name `wikipedia-sql-tutorial`, use REPLACE INTO instead of INSERT INTO."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "045f782c-74d8-4447-9487-529071812b51",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql/task\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "\"query\": \"INSERT INTO \\\"wikipedia-sql-tutorial\\\" SELECT TIME_PARSE(\\\"timestamp\\\") \\\n",
+    "          AS __time, * FROM TABLE \\\n",
+    "          (EXTERN('{\\\"type\\\": \\\"http\\\", \\\"uris\\\": [\\\"https://druid.apache.org/data/wikipedia.json.gz\\\"]}', '{\\\"type\\\": \\\"json\\\"}', '[{\\\"name\\\": \\\"added\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"channel\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"cityName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"comment\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"commentLength\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"countryIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"countryName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"deleted\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"delta\\\", \\\"type\\\": \\\"long\\\"}, {\\\"name\\\": \\\"deltaBucket\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"diffUrl\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"flags\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isAnonymous\\\", \\\"type\\\": \\\"string\\\"}, {\\\
 "name\\\": \\\"isMinor\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isNew\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isRobot\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"isUnpatrolled\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"metroCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"namespace\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"page\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionIsoCode\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"regionName\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"timestamp\\\", \\\"type\\\": \\\"string\\\"}, {\\\"name\\\": \\\"user\\\", \\\"type\\\": \\\"string\\\"}]')) \\\n",
+    "          PARTITIONED BY DAY\",\n",
+    "  \"context\": {\n",
+    "    \"maxNumTasks\": 3\n",
+    "  }\n",
+    "})\n",
+    "\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "ingestion_taskId_response = response\n",
+    "ingestion_taskId = json.loads(ingestion_taskId_response.text)['taskId']\n",
+    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
+    "print(f\"\\nInserting data into the table named {dataSourceName}\")\n",
+    "print(\"\\nThe response includes the task ID and the status: \" + response.text + \".\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ceb86ce0-85f6-4c63-8fd6-883033ee96e9",
+   "metadata": {},
+   "source": [
+    "Wait for ingestion to complete before proceeding.\n",
+    "To check on the status of your ingestion task, run the following cell.\n",
+    "It continuously fetches the status of the ingestion job until the ingestion job is complete."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "df12d12c-a067-4759-bae0-0410c24b6205",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import time\n",
+    "\n",
+    "endpoint = f\"/druid/indexer/v1/task/{ingestion_taskId}/status\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"GET\"\n",
+    "\n",
+    "payload = {}\n",
+    "headers = {}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "ingestion_status = json.loads(response.text)['status']['status']\n",
+    "# If you only want to fetch the status once and print it, \n",
+    "# uncomment the print statement and comment out the if and while loops\n",
+    "# print(json.dumps(response.json(), indent=4))\n",
+    "\n",
+    "if ingestion_status == \"RUNNING\":\n",
+    "  print(\"The ingestion is running...\")\n",
+    "\n",
+    "while ingestion_status != \"SUCCESS\":\n",
+    "  response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "  ingestion_status = json.loads(response.text)['status']['status']\n",
+    "  time.sleep(15)  \n",
+    "  \n",
+    "if ingestion_status == \"SUCCESS\": \n",
+    "  print(\"The ingestion is complete:\")\n",
+    "  print(json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "240b0ad5-48f2-4737-b12b-5fd5f98da300",
+   "metadata": {},
+   "source": [
+    "## Datasources\n",
+    "\n",
+    "Druid supports a variety of datasources, with the table datasource being the most common. In Druid documentation, the word \"datasource\" often implicitly refers to the table datasource.\n",
+    "The [Datasources](https://druid.apache.org/docs/latest/querying/datasource.html) topic provides a comprehensive overview of datasources supported by Druid SQL.\n",
+    "\n",
+    "In Druid SQL, table datasources reside in the `druid` schema. This is the default schema, so table datasources can be referenced as either `druid.dataSourceName` or `dataSourceName`.\n",
+    "\n",
+    "For example, run the next cell to return the rows of the column named `channel` from the `wikipedia-sql-tutorial` table. Because this tutorial is running in Jupyter, the cells use the LIMIT clause to limit the size of the query results for display purposes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "91dd255a-4d55-493e-a067-4cef5c659657",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT \\\"channel\\\" FROM \\\"wikipedia-sql-tutorial\\\" LIMIT 7\"})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "\n",
+    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
+    "print(f\"\\nEach JSON object in the response represents a row in the {dataSourceName} datasource.\") \n",
+    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cbeb5a63",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## Data types\n",
+    "\n",
+    "Druid maps SQL data types onto native types at query runtime.\n",
+    "The following native types are supported for Druid columns:\n",
+    "\n",
+    "* STRING: UTF-8 encoded strings and string arrays\n",
+    "* LONG: 64-bit signed int\n",
+    "* FLOAT: 32-bit float\n",
+    "* DOUBLE: 64-bit float\n",
+    "* COMPLEX: represents non-standard data types, such as nested JSON, hyperUnique and approxHistogram aggregators, and DataSketches aggregators\n",
+    "\n",
+    "Druid exposes table and column metadata through [INFORMATION_SCHEMA](https://druid.apache.org/docs/latest/querying/sql-metadata-tables.html#information-schema) tables. Run the following query to retrieve metadata for the `wikipedia-sql-tutorial` datasource. In the response body, each JSON object correlates to a column in the table.\n",
+    "Check the objects' `DATA_TYPE` property for SQL data types. You should see TIMESTAMP, BIGINT, and VARCHAR SQL data types. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b9227d6c-1d8c-4169-b13b-a08625c4011f",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE \\\"TABLE_SCHEMA\\\" = 'druid' AND \\\"TABLE_NAME\\\" = 'wikipedia-sql-tutorial' LIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "\n",
+    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
+    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c59ca797-dd91-442b-8d02-67b711b3fcc6",
+   "metadata": {},
+   "source": [
+    "Druid natively interprets VARCHAR as STRING and BIGINT and TIMESTAMP SQL data types as LONG. For reference on how SQL data types map onto Druid native types, see [Standard types](https://druid.apache.org/docs/latest/querying/sql-data-types.html#standard-types).\n",
+    "\n",
+    "### Timestamp values\n",
+    "\n",
+    "Druid stores timestamp values as the number of milliseconds since the Unix epoch.\n",
+    "Primary timestamps are stored in a column named `__time`.\n",
+    "If a dataset doesn't have a timestamp, Druid uses the default value of `1970-01-01 00:00:00`.\n",
+    "\n",
+    "Druid time functions perform best when used with the `__time` column.\n",
+    "By default, time functions use the UTC time zone.\n",
+    "For more information about timestamp handling, see [Date and time functions](https://druid.apache.org/docs/latest/querying/sql-scalar.html#date-and-time-functions).\n",
+    "\n",
+    "Run the following cell to see a time function at work. This example uses the `TIME_IN_INTERVAL` function to query the `channel` and `page` columns of the `wikipedia-sql-tutorial` for rows whose timestamp is contained within the specified interval. The cell groups the results by columns."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7e3d62a-1325-4992-8bcd-c0f1925704bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "endpoint = \"/druid/v2/sql\"\n",
+    "print(f\"{bold}Query endpoint{standard}: {druid_host+endpoint}\")\n",
+    "http_method = \"POST\"\n",
+    "\n",
+    "payload = json.dumps({\n",
+    "  \"query\":\"SELECT channel, page\\nFROM \\\"wikipedia-sql-tutorial\\\" WHERE TIME_IN_INTERVAL(__time, '2016-06-27T00:05:54.56/2016-06-27T00:06:53')\\nGROUP BY channel, page\\nLIMIT 7\"\n",
+    "})\n",
+    "headers = {'Content-Type': 'application/json'}\n",
+    "\n",
+    "response = requests.request(http_method, druid_host+endpoint, headers=headers, data=payload)\n",
+    "\n",
+    "print(f\"{bold}Query{standard}:\\n\" + payload)\n",
+    "print(f\"\\n{bold}Response{standard}: \\n\" + json.dumps(response.json(), indent=4))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f7cfdfae-ccba-49ba-a70f-63d0bd3527b2",
+   "metadata": {},
+   "source": [
+    "### NULL values\n",
+    "\n",
+    "Druid supports SQL compatible NULL handling, allowing string columns to distinguish empty strings from NULL and numeric columns to contain NULL rows. To store and query data in SQL compatible mode, explicitly set the `useDefaultValueForNull` property to `false` in `_common/common.runtime.properties`. See [Configuration reference](https://druid.apache.org/docs/latest/configuration/index.html) for common configuration properties.\n",
+    "\n",
+    "When `useDefaultValueForNull` is set to `true` (default behavior), Druid stores NULL values as `0` for numeric columns and as `''` for string columns."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "29c24856",
+   "metadata": {
+    "deletable": true,
+    "tags": []
+   },
+   "source": [
+    "## SELECT statement syntax\n",
+    "\n",
+    "Druid SQL supports SELECT statements with the following structure:\n",
+    "\n",
+    "``` mysql\n",
+    "[ EXPLAIN PLAN FOR ]\n",
+    "[ WITH tableName [ ( column1, column2, ... ) ] AS ( query ) ]\n",
+    "SELECT [ ALL | DISTINCT ] { * | exprs }\n",
+    "FROM { <table> | (<subquery>) | <o1> [ INNER | LEFT ] JOIN <o2> ON condition }\n",
+    "[ WHERE expr ]\n",
+    "[ GROUP BY [ exprs | GROUPING SETS ( (exprs), ... ) | ROLLUP (exprs) | CUBE (exprs) ] ]\n",
+    "[ HAVING expr ]\n",
+    "[ ORDER BY expr [ ASC | DESC ], expr [ ASC | DESC ], ... ]\n",
+    "[ LIMIT limit ]\n",
+    "[ OFFSET offset ]\n",
+    "[ UNION ALL <another query> ]\n",
+    "```\n",
+    "\n",
+    "As a general rule, use the LIMIT clause with `SELECT *` to limit the number of rows returned. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8cf212e5-fb3f-4206-acdd-46ef1da327ab",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "## WHERE clause\n",
+    "\n",
+    "Druid SQL uses the [SQL WHERE clause](https://druid.apache.org/docs/latest/querying/sql.html#where) of a SELECT statement to fetch data based on a particular condition.\n",
+    "\n",
+    "In most cases, filtering your results by time using the WHERE clause improves query performance.\n",
+    "This is because Druid partitions data into time chunks and having a time range allows Druid to skip over unrelated data.\n",
+    "You can further partition segments within a time chunk using the CLUSTERED BY clause.\n",
+    "In general, clustering on a particular dimension improves locality and compressibility.\n",

Review Comment:
   ```suggestion
       "In general, clustering on a particular dimension improves locality and compressibility.\n",
       "At query time, using the WHERE clause to filter on clustered dimensions can improve query performance.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org