You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "techdocsmith (via GitHub)" <gi...@apache.org> on 2023/03/02 21:35:36 UTC

[GitHub] [druid] techdocsmith commented on a diff in pull request #13787: Python Druid API for use in notebooks

techdocsmith commented on code in PR #13787:
URL: https://github.com/apache/druid/pull/13787#discussion_r1123739766


##########
examples/quickstart/jupyter-notebooks/api-tutorial.ipynb:
##########
@@ -458,11 +665,16 @@
     "- [Druid SQL API](https://druid.apache.org/docs/latest/querying/sql-api.html)\n",
     "- [API reference](https://druid.apache.org/docs/latest/operations/api-reference.html)\n",
     "\n",
-    "You can also try out the [druid-client](https://github.com/paul-rogers/druid-client), a Python library for Druid created by Paul Rogers, a Druid contributor.\n",
-    "\n",
-    "\n",
-    "\n"
+    "You can also try out the [druid-client](https://github.com/paul-rogers/druid-client), a Python library for Druid created by Paul Rogers, a Druid contributor. A simplified version of that library is included with these tutorials. See [the Python API Tutorial](Python_API_Tutorial.ipynb) for an overview. That tutorial shows how to do the same tasks as this one, but in a simpler form: focusing on the Druid actions and not the mechanics of the REST API."
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "386a05e5",
+   "metadata": {},
+   "outputs": [],
+   "source": []

Review Comment:
   empty code block



##########
examples/quickstart/jupyter-notebooks/-START HERE-.ipynb:
##########
@@ -0,0 +1,164 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "e415d732",
+   "metadata": {},
+   "source": [
+    "# Jupyter Notebook tutorials for Druid\n",
+    "\n",
+    "<!-- This README and the tutorial-jupyter-index.md file in docs/tutorials share a lot of the same content.\n",
+    "If you make a change in one place, update the other too. -->\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n",
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "You can try out the Druid APIs using the Jupyter Notebook-based tutorials. These\n",
+    "tutorials provide snippets of Python code that you can use to run calls against\n",
+    "the Druid API to complete the tutorial."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "60015702",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "To get this far, you've installed Python 3 and Jupyter Notebook. Make sure you meet the following requirements before starting the Jupyter-based tutorials:\n",
+    "\n",
+    "- The `requests` package for Python. For example, you can install it with the following command:\n",
+    "\n",
+    "   ```bash\n",
+    "   pip3 install requests\n",
+    "   ````\n",
+    "\n",
+    "- JupyterLab (recommended) or Jupyter Notebook running on a non-default port. By default, Druid\n",
+    "  and Jupyter both try to use port `8888`, so start Jupyter on a different port.\n",
+    "\n",
+    "- An available Druid instance. You can use the local quickstart configuration\n",
+    "  described in [Quickstart](https://druid.apache.org/docs/latest/tutorials/index.html).\n",
+    "  The tutorials assume that you are using the quickstart, so no authentication or authorization\n",
+    "  is expected unless explicitly mentioned.\n",
+    "\n",
+    "## Simple Druid API\n",
+    "\n",
+    "One of the notebooks shows how to use the Druid REST API. The others focus on other\n",
+    "topics and use a simple set of Python wrappers around the underlying REST API. The\n",
+    "wrappers reside in the `druidapi` package within this directory. While the package\n",
+    "can be used in any Python program, the key purpose, at present, is to support these\n",
+    "notebooks. See the [Introduction to the Druid Python API](Python_API_Tutorial.ipynb)\n",
+    "for an overview of the Python API."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9e18342",
+   "metadata": {},
+   "source": [
+    "## Tutorials\n",
+    "\n",
+    "The notebooks are located in the [apache/druid repo](\n",
+    "https://github.com/apache/druid/tree/master/examples/quickstart/jupyter-notebooks/).\n",
+    "You can either clone the repo or download the notebooks you want individually.\n",
+    "\n",
+    "The links that follow are the raw GitHub URLs, so you can use them to download the\n",
+    "notebook directly, such as with `wget`, or manually through your web browser. Note\n",
+    "that if you save the file from your web browser, make sure to remove the `.txt` extension.\n",
+    "\n",
+    "- [Introduction to the Druid REST API](api-tutorial.ipynb) walks you through some of the\n",
+    "  basics related to the Druid REST API and several endpoints.\n",
+    "- [Introduction to the Druid Python API](Python_API_Tutorial.ipynb) walks you through some of the\n",
+    "  basics related to the Druid API using the Python wrapper API.\n",
+    "- [Learn the basics of Druid SQL](sql-tutorial.ipynb) introduces you to the unique aspects of Druid SQL with the primary focus on the SELECT statement. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a4b986a",
+   "metadata": {},
+   "source": [
+    "## Contributing\n",
+    "\n",
+    "If you build a Jupyter tutorial, you need to do a few things to add it to the docs\n",
+    "in addition to saving the notebook in this directory. The process requires two PRs to the repo.\n",
+    "\n",
+    "For the first PR, do the following:\n",
+    "\n",
+    "1. Depending on the goal of the notebook, you may want to clear the outputs from your notebook\n",
+    "   before you make the PR. You can use the following command:\n",
+    "\n",
+    "   ```bash\n",
+    "   jupyter nbconvert --ClearOutputPreprocessor.enabled=True --inplace ./path/to/notebook/notebookName.ipynb\n",
+    "   ```\n",
+    "   \n",
+    "   This can also be done in Jupyter Notebook itself: `Kernel` &rarr; `Restart & Clear Output`\n",
+    "\n",
+    "2. Create the PR as you normally would. Make sure to note that this PR is the one that\n",
+    "   contains only the Jupyter notebook and that there will be a subsequent PR that updates\n",
+    "   related pages.\n",
+    "\n",
+    "3. After this first PR is merged, grab the \"raw\" URL for the file from GitHub. For example,\n",
+    "   navigate to the file in the GitHub web UI and select **Raw**. Use the URL for this in the\n",
+    "   second PR as the download link.\n",
+    "\n",
+    "For the second PR, do the following:\n",
+    "\n",
+    "1. Update the list of [Tutorials](#tutorials) on this page and in the\n",
+    "   [Jupyter tutorial index page](../../../docs/tutorials/tutorial-jupyter-index.md#tutorials)\n",
+    "   in the `docs/tutorials` directory.\n",
+    "\n",
+    "2. Update `tutorial-jupyter-index.md` and provide the URL to the raw version of the file\n",
+    "   that becomes available after the first PR is merged.\n",
+    "\n",
+    "Note that you can skip the second PR, if you just copy the prefix link from one of the\n",
+    "existing notebook links when doing your first PR."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5e6f2a0e",
+   "metadata": {},
+   "outputs": [],
+   "source": []

Review Comment:
   empty code block in notebook



##########
examples/quickstart/jupyter-notebooks/Python_API_Tutorial.ipynb:
##########
@@ -0,0 +1,751 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ce2efaaa",
+   "metadata": {},
+   "source": [
+    "# Learn the Druid Python API\n",
+    "\n",
+    "<!--\n",
+    "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
+    "  ~ or more contributor license agreements.  See the NOTICE file\n",
+    "  ~ distributed with this work for additional information\n",
+    "  ~ regarding copyright ownership.  The ASF licenses this file\n",
+    "  ~ to you under the Apache License, Version 2.0 (the\n",
+    "  ~ \"License\"); you may not use this file except in compliance\n",
+    "  ~ with the License.  You may obtain a copy of the License at\n",
+    "  ~\n",
+    "  ~   http://www.apache.org/licenses/LICENSE-2.0\n",
+    "  ~\n",
+    "  ~ Unless required by applicable law or agreed to in writing,\n",
+    "  ~ software distributed under the License is distributed on an\n",
+    "  ~ \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+    "  ~ KIND, either express or implied.  See the License for the\n",
+    "  ~ specific language governing permissions and limitations\n",
+    "  ~ under the License.\n",
+    "  -->\n",
+    "\n",
+    "This notebook provides a quick introduction to the Python wrapper around the [Druid REST API](api-tutorial.ipynb). This notebook assumes you are familiar with the basics of the REST API, and the [set of operations which Druid provides](https://druid.apache.org/docs/latest/operations/api-reference.html). This tutorial focuses on using Python to access those APIs rather than explaining the APIs themselves. The APIs themselves are covered in other notebooks that use the Python API.\n",
+    "\n",
+    "The Druid Python API is primarily intended to help with these notebook tutorials. It can also be used in your own ad-hoc notebooks, or in a regular Python program.\n",
+    "\n",
+    "The Druid Python API is a work in progress. The Druid team adds API wrappers as needed for the notebook tutorials. If you find you need additional wrappers, please feel free to add them, and post a PR to Apache Druid with your additions.\n",
+    "\n",
+    "The API provides two levels of functions. Most are simple wrappers around Druid's REST APIs. Others add additional code to make the API easier to use. The SQL query interface is a prime example: extra code translates a simple SQL query into Druid's `SQLQuery` object and interprets the results into a form that can be displayed in a notebook.\n",
+    "\n",
+    "This notebook contains sample output to allow it to work a bit like a reference. To run it yourself, start by using the `Kernel` &rarr; `Restart & Clear Output` menu command to clear the sample output.\n",
+    "\n",
+    "Start by importing the `druidapi` package from the same folder as this notebook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6d90ca5d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import druidapi"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fb68a838",
+   "metadata": {},
+   "source": [
+    "Next, connect to your cluster by providing the router endpoint. The code assumes the cluster is on your local machine, using the default port. Go ahead and change this if your setup is different.\n",
+    "\n",
+    "The API uses the router to forward messages to each of Druid's services so that you don't have to keep track of the host and port for each service.\n",
+    "\n",
+    "The `jupyter_client()` method waits for the cluster to be ready, and sets up the client to display tables and messages as HTML. To use this code without waiting and without HTML formatting, use the `client()` method instead."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ae601081",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "druid = druidapi.jupyter_client('http://localhost:8888')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8b4e774b",
+   "metadata": {},
+   "source": [
+    "## Status Client\n",
+    "\n",
+    "The SDK groups Druid REST API calls into categories, with a client for each. Start with the status client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ff16fc3b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "status_client = druid.status"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be992774",
+   "metadata": {},
+   "source": [
+    "Use the Python `help()` function to learn what methods are avaialble."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "03f26417",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "help(status_client)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e803c9fe",
+   "metadata": {},
+   "source": [
+    "Check the version of your cluster. Some of these notebooks illustrate newer features available only on specific versions of Druid."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2faa0d81",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d78a6c35",
+   "metadata": {},
+   "source": [
+    "You can also check which extensions are loaded in your cluster. Some notebooks require specific extensions to be available."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1001f412",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "status_client.properties['druid.extensions.loadList']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "012b2e61",
+   "metadata": {},
+   "source": [
+    "## Display Client\n",
+    "\n",
+    "The display client performs Druid operations, then formats the results for display in a notebook. Running SQL queries in a notebook is easy with the display client.\n",
+    "\n",
+    "When run outside a notebook, the display client formats results as text. The display client is the most convenient way to work with Druid in a notebook. Most operations also have a form that returns results as Python objects rather than displaying them. Use these methods if you write code to work with the results. Here the goal is just to interact with Druid."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f867f1f0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display = druid.display"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d051bc5e",
+   "metadata": {},
+   "source": [
+    "Start by getting a list of schemas."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd8387e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.schemas()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b8261ab0",
+   "metadata": {},
+   "source": [
+    "Then, retreive the tables (or datasources) within any schema."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "64dcb46a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.tables('INFORMATION_SCHEMA')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ff311595",
+   "metadata": {},
+   "source": [
+    "The above shows the list of datasources by default. You'll get an empty result if you have no datasources yet."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "616770ce",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.tables()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7392e484",
+   "metadata": {},
+   "source": [
+    "You can easily run a query and show the results:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2c649eef",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "SELECT TABLE_NAME\n",
+    "FROM INFORMATION_SCHEMA.TABLES\n",
+    "WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'\n",
+    "'''\n",
+    "display.sql(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c6c4e1d4",
+   "metadata": {},
+   "source": [
+    "The query above showed the same results as `tables()`. That is not surprising: `tables()` just runs this query for you."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f414d145",
+   "metadata": {},
+   "source": [
+    "## SQL Client\n",
+    "\n",
+    "While the display client is handy for simple queries, sometimes you need more control, or want to work with the data returned from a query. For this you use the SQL client."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9951e976",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client = druid.sql"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b944084",
+   "metadata": {},
+   "source": [
+    "The SQL client allows you create a SQL request object that enables passing context parameters and query parameters. Druid will work out the query parameter type based on the Python type. Use the display client to show the query results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dd559827",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "SELECT TABLE_NAME\n",
+    "FROM INFORMATION_SCHEMA.TABLES\n",
+    "WHERE TABLE_SCHEMA = ?\n",
+    "'''\n",
+    "req = sql_client.sql_request(sql)\n",
+    "req.add_parameter('INFORMATION_SCHEMA')\n",
+    "req.add_context(\"someParameter\", \"someValue\")\n",
+    "display.sql(req)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "937dc6b1",
+   "metadata": {},
+   "source": [
+    "The request has other features for advanced use cases: see the code for details. The query API actually returns a sql response object. Use this if you want to get the values directly, work with the schema, etc."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fd7a1827",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "SELECT TABLE_NAME\n",
+    "FROM INFORMATION_SCHEMA.TABLES\n",
+    "WHERE TABLE_SCHEMA = 'INFORMATION_SCHEMA'\n",
+    "'''\n",
+    "resp = sql_client.sql_query(sql)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2fe6a749",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "col1 = resp.schema[0]\n",
+    "print(col1.name, col1.sql_type, col1.druid_type)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "41d27bb1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resp.rows"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "481af1f2",
+   "metadata": {},
+   "source": [
+    "The `show()` method uses this information for format an HTML table to present the results."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8dba807b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "resp.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "99f8db7b",
+   "metadata": {},
+   "source": [
+    "The display and SQL clients are intened for exploratory queries. The [pydruid](https://pythonhosted.org/pydruid/) library provides a robust way to run native queries, to run SQL queries, and to convert the results to various formats."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9e3be017",
+   "metadata": {},
+   "source": [
+    "## MSQ Ingestion\n",
+    "\n",
+    "The SQL client also performs MSQ-based ingestion using `INSERT` or `REPLACE` statements. Use the extension check above to ensure that `druid-multi-stage-query` is loaded in Druid 26. (Later versions may have MSQ built in.)\n",
+    "\n",
+    "An MSQ query is run using a different API: `task()`. This API returns a response object that describes the Overlord task which runs the MSQ query. For tutorials, data is usually small enough you can wait for the ingestion to complete. Do that with the `run_task()` call which handles the waiting. To illustrate, here is a query that ingests a subset of columns, and includes a few data clean-up steps:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "10f1e451",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = '''\n",
+    "REPLACE INTO \"myWiki1\" OVERWRITE ALL\n",
+    "SELECT\n",
+    "  TIME_PARSE(\"timestamp\") AS \"__time\",\n",
+    "  namespace,\n",
+    "  page,\n",
+    "  channel,\n",
+    "  \"user\",\n",
+    "  countryName,\n",
+    "  CASE WHEN isRobot = 'true' THEN 1 ELSE 0 END AS isRobot,\n",
+    "  \"added\",\n",
+    "  \"delta\",\n",
+    "  CASE WHEN isNew = 'true' THEN 1 ELSE 0 END AS isNew,\n",
+    "  CAST(\"deltaBucket\" AS DOUBLE) AS deltaBucket,\n",
+    "  \"deleted\"\n",
+    "FROM TABLE(\n",
+    "  EXTERN(\n",
+    "    '{\"type\":\"http\",\"uris\":[\"https://druid.apache.org/data/wikipedia.json.gz\"]}',\n",
+    "    '{\"type\":\"json\"}',\n",
+    "    '[{\"name\":\"isRobot\",\"type\":\"string\"},{\"name\":\"channel\",\"type\":\"string\"},{\"name\":\"timestamp\",\"type\":\"string\"},{\"name\":\"flags\",\"type\":\"string\"},{\"name\":\"isUnpatrolled\",\"type\":\"string\"},{\"name\":\"page\",\"type\":\"string\"},{\"name\":\"diffUrl\",\"type\":\"string\"},{\"name\":\"added\",\"type\":\"long\"},{\"name\":\"comment\",\"type\":\"string\"},{\"name\":\"commentLength\",\"type\":\"long\"},{\"name\":\"isNew\",\"type\":\"string\"},{\"name\":\"isMinor\",\"type\":\"string\"},{\"name\":\"delta\",\"type\":\"long\"},{\"name\":\"isAnonymous\",\"type\":\"string\"},{\"name\":\"user\",\"type\":\"string\"},{\"name\":\"deltaBucket\",\"type\":\"long\"},{\"name\":\"deleted\",\"type\":\"long\"},{\"name\":\"namespace\",\"type\":\"string\"},{\"name\":\"cityName\",\"type\":\"string\"},{\"name\":\"countryName\",\"type\":\"string\"},{\"name\":\"regionIsoCode\",\"type\":\"string\"},{\"name\":\"metroCode\",\"type\":\"long\"},{\"name\":\"countryIsoCode\",
 \"type\":\"string\"},{\"name\":\"regionName\",\"type\":\"string\"}]'\n",
+    "  )\n",
+    ")\n",
+    "PARTITIONED BY DAY\n",
+    "CLUSTERED BY namespace, page\n",
+    "'''"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d752b1d4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client.run_task(sql)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ef4512f8",
+   "metadata": {},
+   "source": [
+    "MSQ reports task completion as soon as ingestion is done. However, it takes a while for Druid to load the resulting segments. Wait for the table to become ready."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "37fcedf2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client.wait_until_ready('myWiki1')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11d9c95a",
+   "metadata": {},
+   "source": [
+    "`describe_table()` lists the columns in a table."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b662697b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.table('myWiki1')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "936f57fb",
+   "metadata": {},
+   "source": [
+    "You can sample a few rows of data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c4cfa5dc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.sql('SELECT * FROM myWiki1 LIMIT 10')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1152f41",
+   "metadata": {},
+   "source": [
+    "## Datasource Client\n",
+    "\n",
+    "The Datasource client lets you perform operations on datasource objects. The SQL layer allows you to get metadata and do queries. The datasource client works with the underlying segments. Explaining the full functionality is the topic of another notebook. For now, you can use the datasource client to clean up the datasource created above. The `True` argument asks for \"if exists\" semantics so you don't get an error if the datasource was alredy deleted."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fba659ce",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ds_client = druid.datasources\n",
+    "ds_client.drop('myWiki', True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c96fdcc6",
+   "metadata": {},
+   "source": [
+    "## Tasks Client\n",
+    "\n",
+    "Use the tasks client to work with Overlord tasks. The `run_task()` call above actually uses the task client internally to poll Overlord."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b4f5ea17",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "task_client = druid.tasks\n",
+    "task_client.tasks()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1deaf95f",
+   "metadata": {},
+   "source": [
+    "## REST Client\n",
+    "\n",
+    "The Druid Python API starts with a REST client that itself is built on the `requests` package. The REST client implements the common patterns seen in the Druid REST API. You can create a client directly:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b1e55635",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from druidapi.rest import DruidRestClient\n",
+    "rest_client = DruidRestClient(\"http://localhost:8888\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dcb8055f",
+   "metadata": {},
+   "source": [
+    "Or, if you have already created the Druid client, you can reuse the existing REST client. This is how the various other clients work internally."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "370ba76a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "rest_client = druid.rest"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2654e72c",
+   "metadata": {},
+   "source": [
+    "Use the REST client if you need to make calls that are not yet wrapped by the Python API, or if you want to do something special. To illustrate the client, you can make some of the same calls as in the [Druid REST API notebook](api_tutorial.ipynb).\n",
+    "\n",
+    "The REST API maintains the Druid host: you just provide the specifc URL tail. There are methods to get or post JSON results. For example, to get status information:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9e42dfbc",
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "rest_client.get_json('/status')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "837e08b0",
+   "metadata": {},
+   "source": [
+    "A quick comparison of the three approaches (Requests, REST client, Python client):\n",
+    "\n",
+    "Status:\n",
+    "\n",
+    "* Requests: `session.get(druid_host + '/status').json()`\n",
+    "* REST client: `rest_client.get_json('/status')`\n",
+    "* Status client: `status_client.status()`\n",
+    "\n",
+    "Health:\n",
+    "\n",
+    "* Requests: `session.get(druid_host + '/status/health').json()`\n",
+    "* REST client: `rest_client.get_json('/status/health')`\n",
+    "* Status client: `status_client.is_healthy()`\n",
+    "\n",
+    "Ingest data:\n",
+    "\n",
+    "* Requests: See the [REST tutorial](api_tutorial.ipynb)\n",
+    "* REST client: as the REST tutorial, but use `rest_client.post_json('/druid/v2/sql/task', sql_request)` and\n",
+    "  `rest_client.get_json(f\"/druid/indexer/v1/task/{ingestion_taskId}/status\")`\n",
+    "* SQL client: `sql_client.run_task(sql)`, also a form for a full SQL request.\n",
+    "\n",
+    "List datasources:\n",
+    "\n",
+    "* Requests: `session.get(druid_host + '/druid/coordinator/v1/datasources').json()`\n",
+    "* REST client: `rest_client.get_json('/druid/coordinator/v1/datasources')`\n",
+    "* Datasources client: `ds_client.names()`\n",
+    "\n",
+    "Query data, where `sql_request` is a properly-formatted `SqlResquest` dictionary:\n",
+    "\n",
+    "* Requests: `session.post(druid_host + '/druid/v2/sql', json=sql_request).json()`\n",
+    "* REST client: `rest_client.post_json('/druid/v2/sql', sql_request)`\n",
+    "* SQL Client: `sql_client.show(sql)`, where `sql` is the query text\n",
+    "\n",
+    "In general, you have to provide the all the details for the Requests library. The REST client handles the low-level repetitious bits. The Python clients provide methods that encapsulate the specifics of the URLs and return formats."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "edc4ee39",
+   "metadata": {},
+   "source": [
+    "## Constants\n",
+    "\n",
+    "Druid has a large number of special constants: type names, options, etc. The `consts` module provides definitions for many of these:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a90187c6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from druidapi import consts"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fc535898",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "help(consts)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b661b29f",
+   "metadata": {},
+   "source": [
+    "Using the constants avoids typos:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3393af62",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client.show_tables(consts.SYS_SCHEMA)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5e789ca7",
+   "metadata": {},
+   "source": [
+    "## Tracing\n",
+    "\n",
+    "It is often handy to see what the Druid API is doing: what messages it sends to Druid. You may need to debug some function that isn't working as expected. Or, perhaps you want to see what is sent to Druid so you can replicate it in your own code. Either way, just turn on tracing:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ac68b60e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "druid.trace(True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b9dc7e3",
+   "metadata": {},
+   "source": [
+    "Then, each call to Druid prints what it sends:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "72c955c0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql_client.show_tables()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ddaf0dc2",
+   "metadata": {},
+   "source": [
+    "## Conclusion\n",
+    "\n",
+    "This notebook have you a whirlwind tour of the Python Druid API: just enough to check your cluster, ingest some data with MSQ and query that data. Druid has many more APIs. As noted earlier, the Python API is a work in progress: the team adds new wrappers as needed for tutorials. Your [contributions](https://github.com/apache/druid/pulls) and [feedback](https://github.com/apache/druid/issues) are welcome."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0c9a9e4c",
+   "metadata": {},
+   "outputs": [],
+   "source": []

Review Comment:
   empty code block in notebook



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org