You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by vo...@apache.org on 2023/05/18 01:12:28 UTC

[druid-website] branch asf-staging updated: update 26 docs for RC2

This is an automated email from the ASF dual-hosted git repository.

vogievetsky pushed a commit to branch asf-staging
in repository https://gitbox.apache.org/repos/asf/druid-website.git


The following commit(s) were added to refs/heads/asf-staging by this push:
     new 7700df8a update 26 docs for RC2
7700df8a is described below

commit 7700df8aeaabf5a0edbd894b9c7adf6c9f09663c
Author: Vadim Ogievetsky <va...@ogievetsky.com>
AuthorDate: Wed May 17 18:12:21 2023 -0700

    update 26 docs for RC2
---
 docs/26.0.0/configuration/index.html      |  8 ++---
 docs/26.0.0/ingestion/ingestion-spec.html | 32 ++++++++++++-------
 docs/26.0.0/ingestion/schema-design.html  | 52 ++++++++++++++++++++++++++-----
 docs/26.0.0/ingestion/tasks.html          |  2 +-
 docs/latest/configuration/index.html      |  8 ++---
 docs/latest/ingestion/ingestion-spec.html | 32 ++++++++++++-------
 docs/latest/ingestion/schema-design.html  | 52 ++++++++++++++++++++++++++-----
 docs/latest/ingestion/tasks.html          |  2 +-
 8 files changed, 140 insertions(+), 48 deletions(-)

diff --git a/docs/26.0.0/configuration/index.html b/docs/26.0.0/configuration/index.html
index 9e222179..b2c4d50e 100644
--- a/docs/26.0.0/configuration/index.html
+++ b/docs/26.0.0/configuration/index.html
@@ -1088,7 +1088,7 @@ However, if you need to do it via HTTP, the JSON object can be submitted to the
 <tr><td><code>killDataSourceWhitelist</code></td><td>List of specific data sources for which kill tasks are sent if property <code>druid.coordinator.kill.on</code> is true. This can be a list of comma-separated data source names or a JSON array.</td><td>none</td></tr>
 <tr><td><code>killPendingSegmentsSkipList</code></td><td>List of data sources for which pendingSegments are <em>NOT</em> cleaned up if property <code>druid.coordinator.kill.pendingSegments.on</code> is true. This can be a list of comma-separated data sources or a JSON array.</td><td>none</td></tr>
 <tr><td><code>maxSegmentsInNodeLoadingQueue</code></td><td>The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are &quot;slow&quot; nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, a [...]
-<tr><td><code>useRoundRobinSegmentAssignment</code></td><td>Boolean flag for whether segments should be assigned to historicals in a round robin fashion. When disabled, segment assignment is done using the chosen balancer strategy. When enabled, this can speed up segment assignments leaving balancing to move the segments to their optimal locations (based on the balancer strategy) lazily.</td><td>false</td></tr>
+<tr><td><code>useRoundRobinSegmentAssignment</code></td><td>Boolean flag for whether segments should be assigned to historicals in a round robin fashion. When disabled, segment assignment is done using the chosen balancer strategy. When enabled, this can speed up segment assignments leaving balancing to move the segments to their optimal locations (based on the balancer strategy) lazily.</td><td>true</td></tr>
 <tr><td><code>decommissioningNodes</code></td><td>List of historical servers to 'decommission'. Coordinator will not assign new segments to 'decommissioning' servers,  and segments will be moved away from them to be placed on non-decommissioning servers at the maximum rate specified by <code>decommissioningMaxPercentOfMaxSegmentsToMove</code>.</td><td>none</td></tr>
 <tr><td><code>decommissioningMaxPercentOfMaxSegmentsToMove</code></td><td>Upper limit of segments the Coordinator can move from decommissioning servers to active non-decommissioning servers during a single run. This value is relative to the total maximum number of segments that can be moved at any given time based upon the value of <code>maxSegmentsToMove</code>.<br /><br />If <code>decommissioningMaxPercentOfMaxSegmentsToMove</code> is 0, the Coordinator does not move segments to decomm [...]
 <tr><td><code>pauseCoordination</code></td><td>Boolean flag for whether or not the coordinator should execute its various duties of coordinating the cluster. Setting this to true essentially pauses all coordination work while allowing the API to remain up. Duties that are paused include all classes that implement the <code>CoordinatorDuty</code> Interface. Such duties include: Segment balancing, Segment compaction, Emission of metrics controlled by the dynamic coordinator config <code>em [...]
@@ -1248,7 +1248,7 @@ The below is a list of the supported configurations for auto-compaction.</p>
 <tr><td><code>druid.indexer.storage.type</code></td><td>Choices are &quot;local&quot; or &quot;metadata&quot;. Indicates whether incoming tasks should be stored locally (in heap) or in metadata storage. &quot;local&quot; is mainly for internal testing while &quot;metadata&quot; is recommended in production because storing incoming tasks in metadata storage allows for tasks to be resumed if the Overlord should fail.</td><td>local</td></tr>
 <tr><td><code>druid.indexer.storage.recentlyFinishedThreshold</code></td><td>Duration of time to store task results. Default is 24 hours. If you have hundreds of tasks running in a day, consider increasing this threshold.</td><td>PT24H</td></tr>
 <tr><td><code>druid.indexer.tasklock.forceTimeChunkLock</code></td><td><em><strong>Setting this to false is still experimental</strong></em><br/> If set, all tasks are enforced to use time chunk lock. If not set, each task automatically chooses a lock type to use. This configuration can be overwritten by setting <code>forceTimeChunkLock</code> in the <a href="/docs/26.0.0/ingestion/tasks.html#context">task context</a>. See <a href="/docs/26.0.0/ingestion/tasks.html#context">Task Locking  [...]
-<tr><td><code>druid.indexer.tasklock.batchSegmentAllocation</code></td><td>If set to true, Druid performs segment allocate actions in batches to improve throughput and reduce the average <code>task/action/run/time</code>. See <a href="/docs/26.0.0/ingestion/tasks.html#batching-segmentallocate-actions">batching <code>segmentAllocate</code> actions</a> for details.</td><td>false</td></tr>
+<tr><td><code>druid.indexer.tasklock.batchSegmentAllocation</code></td><td>If set to true, Druid performs segment allocate actions in batches to improve throughput and reduce the average <code>task/action/run/time</code>. See <a href="/docs/26.0.0/ingestion/tasks.html#batching-segmentallocate-actions">batching <code>segmentAllocate</code> actions</a> for details.</td><td>true</td></tr>
 <tr><td><code>druid.indexer.tasklock.batchAllocationWaitTime</code></td><td>Number of milliseconds after Druid adds the first segment allocate action to a batch, until it executes the batch. Allows the batch to add more requests and improve the average segment allocation run time. This configuration takes effect only if <code>batchSegmentAllocation</code> is enabled.</td><td>500</td></tr>
 <tr><td><code>druid.indexer.task.default.context</code></td><td>Default task context that is applied to all tasks submitted to the Overlord. Any default in this config does not override neither the context values the user provides nor <code>druid.indexer.tasklock.forceTimeChunkLock</code>.</td><td>empty context</td></tr>
 <tr><td><code>druid.indexer.queue.maxSize</code></td><td>Maximum number of active tasks at one time.</td><td>Integer.MAX_VALUE</td></tr>
@@ -1661,7 +1661,7 @@ ensure at least this amount of direct memory is available by providing <code>-XX
 <tr><td><code>druid.indexer.task.hadoopWorkingPath</code></td><td>Temporary working directory for Hadoop tasks.</td><td><code>/tmp/druid-indexing</code></td></tr>
 <tr><td><code>druid.indexer.task.restoreTasksOnRestart</code></td><td>If true, MiddleManagers will attempt to stop tasks gracefully on shutdown and restore them on restart.</td><td>false</td></tr>
 <tr><td><code>druid.indexer.task.ignoreTimestampSpecForDruidInputSource</code></td><td>If true, tasks using the <a href="/docs/26.0.0/ingestion/native-batch-input-sources.html">Druid input source</a> will ignore the provided timestampSpec, and will use the <code>__time</code> column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.</td><td>false</td></tr>
-<tr><td><code>druid.indexer.task.storeEmptyColumns</code></td><td>Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the <a href="/docs/26.0.0/ingestion/ingestion-spec.html#dimensionsspec"><code>dimensionsSpec</code></a>. If you use schemaless ingestion and don't specify any dimensions to ingest, you must also set <a href="/docs/26.0.0/ingestion/ingestion-spec.html#dimensionsspec"><code>includeAllDimensions</ [...]
+<tr><td><code>druid.indexer.task.storeEmptyColumns</code></td><td>Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the <a href="/docs/26.0.0/ingestion/ingestion-spec.html#dimensionsspec"><code>dimensionsSpec</code></a>. If you use the string-based schemaless ingestion and don't specify any dimensions to ingest, you must also set <a href="/docs/26.0.0/ingestion/ingestion-spec.html#dimensionsspec"><code>inclu [...]
 <tr><td><code>druid.indexer.task.tmpStorageBytesPerTask</code></td><td>Maximum number of bytes per task to be used to store temporary files on disk. This usage is split among all temporary storage usages for the task. An exception might be thrown if this limit is too low for the task or if this limit would be exceeded. This limit is currently respected only by MSQ tasks. Other types of tasks might exceed this limit. A value of -1 disables this limit.</td><td>-1</td></tr>
 <tr><td><code>druid.indexer.server.maxChatRequests</code></td><td>Maximum number of concurrent requests served by a task's chat handler. Set to 0 to disable limiting.</td><td>0</td></tr>
 </tbody>
@@ -1735,7 +1735,7 @@ then the value from the configuration below is used:</p>
 <tr><td><code>druid.indexer.task.hadoopWorkingPath</code></td><td>Temporary working directory for Hadoop tasks.</td><td><code>/tmp/druid-indexing</code></td></tr>
 <tr><td><code>druid.indexer.task.restoreTasksOnRestart</code></td><td>If true, the Indexer will attempt to stop tasks gracefully on shutdown and restore them on restart.</td><td>false</td></tr>
 <tr><td><code>druid.indexer.task.ignoreTimestampSpecForDruidInputSource</code></td><td>If true, tasks using the <a href="/docs/26.0.0/ingestion/native-batch-input-sources.html">Druid input source</a> will ignore the provided timestampSpec, and will use the <code>__time</code> column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.</td><td>false</td></tr>
-<tr><td><code>druid.indexer.task.storeEmptyColumns</code></td><td>Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the <a href="/docs/26.0.0/ingestion/ingestion-spec.html#dimensionsspec"><code>dimensionsSpec</code></a>. If you use schemaless ingestion and don't specify any dimensions to ingest, you must also set <a href="/docs/26.0.0/ingestion/ingestion-spec.html#dimensionsspec"><code>includeAllDimensions</ [...]
+<tr><td><code>druid.indexer.task.storeEmptyColumns</code></td><td>Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the <a href="/docs/26.0.0/ingestion/ingestion-spec.html#dimensionsspec"><code>dimensionsSpec</code></a>. <br/><br/>If you set <code>storeEmptyColumns</code> to false, Druid SQL queries referencing empty columns will fail. If you intend to leave <code>storeEmptyColumns</code> disabled, you shoul [...]
 <tr><td><code>druid.peon.taskActionClient.retry.minWait</code></td><td>The minimum retry time to communicate with Overlord.</td><td>PT5S</td></tr>
 <tr><td><code>druid.peon.taskActionClient.retry.maxWait</code></td><td>The maximum retry time to communicate with Overlord.</td><td>PT1M</td></tr>
 <tr><td><code>druid.peon.taskActionClient.retry.maxRetryCount</code></td><td>The maximum number of retries to communicate with Overlord.</td><td>60</td></tr>
diff --git a/docs/26.0.0/ingestion/ingestion-spec.html b/docs/26.0.0/ingestion/ingestion-spec.html
index ce81d041..6fed134a 100644
--- a/docs/26.0.0/ingestion/ingestion-spec.html
+++ b/docs/26.0.0/ingestion/ingestion-spec.html
@@ -77,7 +77,7 @@
   ~ specific language governing permissions and limitations
   ~ under the License.
   -->
-<p>All ingestion methods use ingestion tasks to load data into Druid. Streaming ingestion uses ongoing supervisors that run and supervise a set of tasks over time. Native batch and Hadoop-based ingestion use a one-time <a href="/docs/26.0.0/ingestion/tasks.html">task</a>. All types of ingestion use an <em>ingestion spec</em> to configure ingestion.</p>
+<p>All ingestion methods use ingestion tasks to load data into Druid. Streaming ingestion uses ongoing supervisors that run and supervise a set of tasks over time. Native batch and Hadoop-based ingestion use a one-time <a href="/docs/26.0.0/ingestion/tasks.html">task</a>. Other than with SQL-based ingestion, use an <em>ingestion spec</em> to configure your ingestion.</p>
 <p>Ingestion specs consists of three main components:</p>
 <ul>
 <li><a href="#dataschema"><code>dataSchema</code></a>, which configures the <a href="#datasource">datasource name</a>,
@@ -223,15 +223,20 @@ your ingestion spec.</p>
 <p>Treat <code>__time</code> as a millisecond timestamp: the number of milliseconds since Jan 1, 1970 at midnight UTC.</p>
 <h3><a class="anchor" aria-hidden="true" id="dimensionsspec"></a><a href="#dimensionsspec" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0- [...]
 <p>The <code>dimensionsSpec</code> is located in <code>dataSchema</code> → <code>dimensionsSpec</code> and is responsible for
-configuring <a href="/docs/26.0.0/ingestion/data-model.html#dimensions">dimensions</a>. An example <code>dimensionsSpec</code> is:</p>
-<pre><code class="hljs"><span class="hljs-string">"dimensionsSpec"</span> : {
-  <span class="hljs-string">"dimensions"</span>: [
-    <span class="hljs-string">"page"</span>,
-    <span class="hljs-string">"language"</span>,
-    { <span class="hljs-string">"type"</span>: <span class="hljs-string">"long"</span>, <span class="hljs-string">"name"</span>: <span class="hljs-string">"userId"</span> }
+configuring <a href="/docs/26.0.0/ingestion/data-model.html#dimensions">dimensions</a>.</p>
+<p>You can either manually specify the dimensions or take advantage of schema auto-discovery where you allow Druid to infer all or some of the schema for your data. This means that you don't have to explicitly specify your dimensions and their type.</p>
+<p>To use schema auto-discovery, set <code>useSchemaDiscovery</code> to <code>true</code>.</p>
+<p>Alternatively, you can use the string-based schemaless ingestion where any discovered dimensions are treated as strings. To do so, leave <code>useSchemaDiscovery</code> set to <code>false</code> (default). Then, set the dimensions list to empty or set the  <code>includeAllDimensions</code> property to <code>true</code>.</p>
+<p>The following <code>dimensionsSpec</code> example uses schema auto-discovery (<code>&quot;useSchemaDiscovery&quot;: true</code>) in conjunction with explicitly defined dimensions to have Druid infer some of the schema for the data:</p>
+<pre><code class="hljs css language-json">"dimensionsSpec" : {
+  "dimensions": [
+    "page",
+    "language",
+    { "type": "long", "name": "userId" }
   ],
-  <span class="hljs-string">"dimensionExclusions"</span> : [],
-  <span class="hljs-string">"spatialDimensions"</span> : []
+  "dimensionExclusions" : [],
+  "spatialDimensions" : [],
+  "useSchemaDiscovery": true
 }
 </code></pre>
 <blockquote>
@@ -249,7 +254,8 @@ your ingestion spec.</p>
 <tr><td><code>dimensions</code></td><td>A list of <a href="#dimension-objects">dimension names or objects</a>. You cannot include the same column in both <code>dimensions</code> and <code>dimensionExclusions</code>.<br /><br />If <code>dimensions</code> and <code>spatialDimensions</code> are both null or empty arrays, Druid treats all columns other than timestamp or metrics that do not appear in <code>dimensionExclusions</code> as String-typed dimension columns. See <a href="#inclusions- [...]
 <tr><td><code>dimensionExclusions</code></td><td>The names of dimensions to exclude from ingestion. Only names are supported here, not objects.<br /><br />This list is only used if the <code>dimensions</code> and <code>spatialDimensions</code> lists are both null or empty arrays; otherwise it is ignored. See <a href="#inclusions-and-exclusions">inclusions and exclusions</a> below for details.</td><td><code>[]</code></td></tr>
 <tr><td><code>spatialDimensions</code></td><td>An array of <a href="/docs/26.0.0/development/geo.html">spatial dimensions</a>.</td><td><code>[]</code></td></tr>
-<tr><td><code>includeAllDimensions</code></td><td>You can set <code>includeAllDimensions</code> to true to ingest both explicit dimensions in the <code>dimensions</code> field and other dimensions that the ingestion task discovers from input data. In this case, the explicit dimensions will appear first in order that you specify them and the dimensions dynamically discovered will come after. This flag can be useful especially with auto schema discovery using <a href="./data-formats.html#f [...]
+<tr><td><code>includeAllDimensions</code></td><td>Note that this field only applies to string-based schema discovery where Druid ingests dimensions it discovers as strings. This is different from schema auto-discovery where Druid infers the type for data. You can set <code>includeAllDimensions</code> to true to ingest both explicit dimensions in the <code>dimensions</code> field and other dimensions that the ingestion task discovers from input data. In this case, the explicit dimensions  [...]
+<tr><td><code>useSchemaDiscovery</code></td><td>Configure Druid to use schema auto-discovery to discover some or all of the dimensions and types for your data. For any dimensions that aren't a uniform type, Druid ingests them as JSON. You can use this for native batch or streaming ingestion.</td><td>false</td></tr>
 </tbody>
 </table>
 <h4><a class="anchor" aria-hidden="true" id="dimension-objects"></a><a href="#dimension-objects" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2- [...]
@@ -261,7 +267,7 @@ a <code>string</code> type dimension object with the given name, e.g. <code>&quo
 <tr><th>Field</th><th>Description</th><th>Default</th></tr>
 </thead>
 <tbody>
-<tr><td>type</td><td>Either <code>string</code>, <code>long</code>, <code>float</code>, <code>double</code>, or <code>json</code>.</td><td><code>string</code></td></tr>
+<tr><td>type</td><td>Either <code>auto</code>, <code>string</code>, <code>long</code>, <code>float</code>, <code>double</code>, or <code>json</code>. For the <code>auto</code> type, Druid determines the most appropriate type for the dimension and assigns one of the following: STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json> columns, all sharing a common 'nested' format. When Druid infers the schema with schema auto-discovery, the type is <code>auto</code> [...]
 <tr><td>name</td><td>The name of the dimension. This will be used as the field name to read from input records, as well as the column name stored in generated segments.<br /><br />Note that you can use a <a href="#transformspec"><code>transformSpec</code></a> if you want to rename columns during ingestion time.</td><td>none (required)</td></tr>
 <tr><td>createBitmapIndex</td><td>For <code>string</code> typed dimensions, whether or not bitmap indexes should be created for the column in generated segments. Creating a bitmap index requires more storage, but speeds up certain kinds of filtering (especially equality and prefix filtering). Only supported for <code>string</code> typed dimensions.</td><td><code>true</code></td></tr>
 <tr><td>multiValueHandling</td><td>Specify the type of handling for <a href="/docs/26.0.0/querying/multi-value-dimensions.html">multi-value fields</a>. Possible values are <code>sorted_array</code>, <code>sorted_set</code>, and <code>array</code>. <code>sorted_array</code> and <code>sorted_set</code> order the array upon ingestion. <code>sorted_set</code> removes duplicates. <code>array</code> ingests data as-is</td><td><code>sorted_array</code></td></tr>
@@ -270,6 +276,9 @@ a <code>string</code> type dimension object with the given name, e.g. <code>&quo
 <h4><a class="anchor" aria-hidden="true" id="inclusions-and-exclusions"></a><a href="#inclusions-and-exclusions" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c [...]
 <p>Druid will interpret a <code>dimensionsSpec</code> in two possible ways: <em>normal</em> or <em>schemaless</em>.</p>
 <p>Normal interpretation occurs when either <code>dimensions</code> or <code>spatialDimensions</code> is non-empty. In this case, the combination of the two lists will be taken as the set of dimensions to be ingested, and the list of <code>dimensionExclusions</code> will be ignored.</p>
+<blockquote>
+<p>The following description of schemaless refers to  string-based schemaless  where Druid treats dimensions it discovers as strings. We recommend you use schema auto-discovery instead where Druid infers the type for the dimension. For more information, see <a href="#dimensionsspec"><code>dimensionsSpec</code></a>.</p>
+</blockquote>
 <p>Schemaless interpretation occurs when both <code>dimensions</code> and <code>spatialDimensions</code> are empty or null. In this case, the set of dimensions is determined in the following way:</p>
 <ol>
 <li>First, start from the set of all root-level fields from the input record, as determined by the <a href="/docs/26.0.0/ingestion/data-formats.html"><code>inputFormat</code></a>. &quot;Root-level&quot; includes all fields at the top level of a data structure, but does not included fields nested within maps or lists. To extract these, you must use a <a href="/docs/26.0.0/ingestion/data-formats.html#flattenspec"><code>flattenSpec</code></a>. All fields of non-nested data formats, such as  [...]
@@ -280,6 +289,7 @@ a <code>string</code> type dimension object with the given name, e.g. <code>&quo
 <li>Any field with the same name as an aggregator from the <a href="#metricsspec">metricsSpec</a> is excluded.</li>
 <li>All other fields are ingested as <code>string</code> typed dimensions with the <a href="#dimension-objects">default settings</a>.</li>
 </ol>
+<p>Additionally, if you have empty columns that you want to include in the string-based schemaless ingestion, you'll need to include the context parameter <code>storeEmptyColumns</code> and set it to <code>true</code>.</p>
 <blockquote>
 <p>Note: Fields generated by a <a href="#transformspec"><code>transformSpec</code></a> are not currently considered candidates for
 schemaless dimension interpretation.</p>
diff --git a/docs/26.0.0/ingestion/schema-design.html b/docs/26.0.0/ingestion/schema-design.html
index 729429f5..f83541a2 100644
--- a/docs/26.0.0/ingestion/schema-design.html
+++ b/docs/26.0.0/ingestion/schema-design.html
@@ -150,7 +150,7 @@ to compute percentiles or quantiles, use Druid's <a href="/docs/26.0.0/querying/
 row in your Druid datasource. This can be useful if you want to store data at a different time granularity than it is
 naturally emitted. It is also useful if you want to combine timeseries and non-timeseries data in the same datasource.</li>
 <li>If you don't know ahead of time what columns you'll want to ingest, use an empty dimensions list to trigger
-<a href="#schema-less-dimensions">automatic detection of dimension columns</a>.</li>
+<a href="#schema-auto-discovery-for-dimensions">automatic detection of dimension columns</a>.</li>
 </ul>
 <h3><a class="anchor" aria-hidden="true" id="log-aggregation-model"></a><a href="#log-aggregation-model" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2 [...]
 <p>(Like Elasticsearch or Splunk.)</p>
@@ -160,8 +160,7 @@ developed. The main data modeling differences between Druid and these systems ar
 you must be more explicit. Druid columns have types specific upfront.</p>
 <p>Tips for modeling log data in Druid:</p>
 <ul>
-<li>If you don't know ahead of time what columns you'll want to ingest, use an empty dimensions list to trigger
-<a href="#schema-less-dimensions">automatic detection of dimension columns</a>.</li>
+<li>If you don't know ahead of time what columns to ingest, you can have Druid perform <a href="#schema-auto-discovery-for-dimensions">schema auto-discovery</a>.</li>
 <li>If you have nested data, you can ingest it using the <a href="/docs/26.0.0/querying/nested-columns.html">nested columns</a> feature or flatten it using a <a href="/docs/26.0.0/ingestion/ingestion-spec.html#flattenspec"><code>flattenSpec</code></a>.</li>
 <li>Consider enabling <a href="/docs/26.0.0/ingestion/rollup.html">rollup</a> if you have mainly analytical use cases for your log data. This will
 mean you lose the ability to retrieve individual events from Druid, but you potentially gain substantial compression and
@@ -241,10 +240,47 @@ the number of Druid rows for the time interval, which can be used to determine w
     { "type": "longSum", "name": "numIngestedEvents", "fieldName": "count" }
 ]
 </code></pre>
-<h3><a class="anchor" aria-hidden="true" id="schema-less-dimensions"></a><a href="#schema-less-dimensions" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0 [...]
-<p>If the <code>dimensions</code> field is left empty in your ingestion spec, Druid will treat every column that is not the timestamp column,
-a dimension that has been excluded, or a metric column as a dimension.</p>
-<p>Note that when using schema-less ingestion, all dimensions will be ingested as String-typed dimensions.</p>
+<h3><a class="anchor" aria-hidden="true" id="schema-auto-discovery-for-dimensions"></a><a href="#schema-auto-discovery-for-dimensions" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 [...]
+<p>Druid can infer the schema for your data in one of two ways:</p>
+<ul>
+<li><a href="#type-aware-schema-discovery">Type-aware schema discovery (experimental)</a> where Druid infers the schema and type for your data. Type-aware schema discovery is an experimental feature currently available for native batch and streaming ingestion.</li>
+<li><a href="#string-based-schema-discovery">String-based schema discovery</a> where all the discovered columns are typed as either native string or multi-value string columns.</li>
+</ul>
+<h4><a class="anchor" aria-hidden="true" id="type-aware-schema-discovery"></a><a href="#type-aware-schema-discovery" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 1 [...]
+<blockquote>
+<p>Note that using type-aware schema discovery can impact downstream BI tools depending on how they handle ARRAY typed columns.</p>
+</blockquote>
+<p>You can have Druid infer the schema and types for your data partially or fully by setting <code>dimensionsSpec.useSchemaDiscovery</code> to <code>true</code> and defining some or no dimensions in the dimensions list.</p>
+<p>When performing type-aware schema discovery, Druid can discover all of the columns of your input data (that aren't in
+the exclusion list). Druid automatically chooses the most appropriate native Druid type among <code>STRING</code>, <code>LONG</code>,
+<code>DOUBLE</code>, <code>ARRAY&lt;STRING&gt;</code>, <code>ARRAY&lt;LONG&gt;</code>, <code>ARRAY&lt;DOUBLE&gt;</code>, or <code>COMPLEX&lt;json&gt;</code> for nested data. For input formats with
+native boolean types, Druid ingests these values as strings if <code>druid.expressions.useStrictBooleans</code> is set to <code>false</code>
+(the default), or longs if set to <code>true</code> (for more SQL compatible behavior). Array typed columns can be queried using
+the <a href="/docs/26.0.0/querying/sql-array-functions.html">array functions</a> or <a href="/docs/26.0.0/querying/sql-functions.html#unnest">UNNEST</a>. Nested
+columns can be queried with the <a href="/docs/26.0.0/querying/sql-json-functions.html">JSON functions</a>.</p>
+<p>Mixed type columns are stored in the <em>least</em> restrictive type that can represent all values in the column. For example:</p>
+<ul>
+<li>Mixed numeric columns are <code>DOUBLE</code></li>
+<li>If there are any strings present, then the column is a <code>STRING</code></li>
+<li>If there are arrays, then the column becomes an array with the least restrictive element type</li>
+<li>Any nested data or arrays of nested data become <code>COMPLEX&lt;json&gt;</code> nested columns.</li>
+</ul>
+<p>If you're already using string-based schema discovery and want to migrate, see <a href="#migrating-to-type-aware-schema-discovery">Migrating to type-aware schema discovery</a>.</p>
+<h4><a class="anchor" aria-hidden="true" id="string-based-schema-discovery"></a><a href="#string-based-schema-discovery" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12  [...]
+<p>If you do not set <code>dimensionsSpec.useSchemaDiscovery</code> to <code>true</code>, Druid can still use the string-based schema discovery for ingestion if any of the following conditions are met:</p>
+<ul>
+<li>The dimension list is empty</li>
+<li>You set <code>includeAllDimensions</code> to <code>true</code></li>
+</ul>
+<p>Druid coerces primitives and arrays of primitive types into the native Druid string type. Nested data structures and arrays of nested data structures are ignored and not ingested.</p>
+<h4><a class="anchor" aria-hidden="true" id="migrating-to-type-aware-schema-discovery"></a><a href="#migrating-to-type-aware-schema-discovery" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0  [...]
+<p>If you previously used string-based schema discovery and want to migrate to type-aware schema discovery, do the following:</p>
+<ul>
+<li>Update any queries that use multi-value dimensions (MVDs) to use UNNEST in conjunction with other functions so that no MVD behavior is being relied upon. Type-aware schema discovery generates ARRAY typed columns instead of MVDs, so queries that use any MVD features will fail.</li>
+<li>Be aware of mixed typed inputs and test how type-aware schema discovery handles them. Druid attempts to cast them as the least restrictive type.</li>
+<li>If you notice issues with numeric types, you may need to explicitly cast them. Generally, Druid handles the coercion for you.</li>
+<li>Update your dimension exclusion list and add any nested columns if you want to continue to exclude them. String-based schema discovery automatically ignores nested columns, but type-aware schema discovery will ingest them.</li>
+</ul>
 <h3><a class="anchor" aria-hidden="true" id="including-the-same-column-as-a-dimension-and-a-metric"></a><a href="#including-the-same-column-as-a-dimension-and-a-metric" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2. [...]
 <p>One workflow with unique IDs is to be able to filter on a particular ID, while still being able to do fast unique counts on the ID column.
 If you are not using schema-less dimensions, this use case is supported by setting the <code>name</code> of the metric to something different than the dimension.
@@ -257,7 +293,7 @@ some work at ETL time.</p>
 <pre><code class="hljs css language-json">{ <span class="hljs-attr">"type"</span>: <span class="hljs-string">"hyperUnique"</span>, <span class="hljs-attr">"name"</span>: <span class="hljs-string">"devices"</span>, <span class="hljs-attr">"fieldName"</span>: <span class="hljs-string">"device_id_met"</span> }
 </code></pre>
 <p><code>device_id_dim</code> should automatically get picked up as a dimension.</p>
-</span></div></article></div><div class="docs-prevnext"><a class="docs-prev button" href="/docs/26.0.0/ingestion/ingestion-spec.html"><span class="arrow-prev">← </span><span>Ingestion spec</span></a><a class="docs-next button" href="/docs/26.0.0/development/extensions-core/kafka-ingestion.html"><span>Apache Kafka ingestion</span><span class="arrow-next"> →</span></a></div></div></div><nav class="onPageNav"><ul class="toc-headings"><li><a href="#druids-data-model">Druid's data model</a></ [...]
+</span></div></article></div><div class="docs-prevnext"><a class="docs-prev button" href="/docs/26.0.0/ingestion/ingestion-spec.html"><span class="arrow-prev">← </span><span>Ingestion spec</span></a><a class="docs-next button" href="/docs/26.0.0/development/extensions-core/kafka-ingestion.html"><span>Apache Kafka ingestion</span><span class="arrow-next"> →</span></a></div></div></div><nav class="onPageNav"><ul class="toc-headings"><li><a href="#druids-data-model">Druid's data model</a></ [...]
                 document.addEventListener('keyup', function(e) {
                   if (e.target !== document.body) {
                     return;
diff --git a/docs/26.0.0/ingestion/tasks.html b/docs/26.0.0/ingestion/tasks.html
index f16ebf36..0ac712e2 100644
--- a/docs/26.0.0/ingestion/tasks.html
+++ b/docs/26.0.0/ingestion/tasks.html
@@ -387,7 +387,7 @@ The settings get passed into the <code>context</code> field of the compaction ta
 <tr><td><code>forceTimeChunkLock</code></td><td>true</td><td><em>Setting this to false is still experimental</em><br/> Force to always use time chunk lock. If not set, each task automatically chooses a lock type to use. If set, this parameter overwrites <code>druid.indexer.tasklock.forceTimeChunkLock</code> <a href="/docs/26.0.0/configuration/index.html#overlord-operations">configuration for the overlord</a>. See <a href="#locking">Locking</a> for more details.</td></tr>
 <tr><td><code>priority</code></td><td>Different based on task types. See <a href="#priority">Priority</a>.</td><td>Task priority</td></tr>
 <tr><td><code>useLineageBasedSegmentAllocation</code></td><td>false in 0.21 or earlier, true in 0.22 or later</td><td>Enable the new lineage-based segment allocation protocol for the native Parallel task with dynamic partitioning. This option should be off during the replacing rolling upgrade from one of the Druid versions between 0.19 and 0.21 to Druid 0.22 or higher. Once the upgrade is done, it must be set to true to ensure data correctness.</td></tr>
-<tr><td><code>storeEmptyColumns</code></td><td>true</td><td>Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the <a href="/docs/26.0.0/ingestion/ingestion-spec.html#dimensionsspec"><code>dimensionsSpec</code></a>. If you use schemaless ingestion and don't specify any dimensions to ingest, you must also set <a href="/docs/26.0.0/ingestion/ingestion-spec.html#dimensionsspec"><code>includeAllDimensions</code>< [...]
+<tr><td><code>storeEmptyColumns</code></td><td>true</td><td>Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the <a href="/docs/26.0.0/ingestion/ingestion-spec.html#dimensionsspec"><code>dimensionsSpec</code></a>. <br/><br/>If you set <code>storeEmptyColumns</code> to false, Druid SQL queries referencing empty columns will fail. If you intend to leave <code>storeEmptyColumns</code> disabled, you should eith [...]
 </tbody>
 </table>
 <h2><a class="anchor" aria-hidden="true" id="task-logs"></a><a href="#task-logs" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.6 [...]
diff --git a/docs/latest/configuration/index.html b/docs/latest/configuration/index.html
index 31567f31..52695509 100644
--- a/docs/latest/configuration/index.html
+++ b/docs/latest/configuration/index.html
@@ -1088,7 +1088,7 @@ However, if you need to do it via HTTP, the JSON object can be submitted to the
 <tr><td><code>killDataSourceWhitelist</code></td><td>List of specific data sources for which kill tasks are sent if property <code>druid.coordinator.kill.on</code> is true. This can be a list of comma-separated data source names or a JSON array.</td><td>none</td></tr>
 <tr><td><code>killPendingSegmentsSkipList</code></td><td>List of data sources for which pendingSegments are <em>NOT</em> cleaned up if property <code>druid.coordinator.kill.pendingSegments.on</code> is true. This can be a list of comma-separated data sources or a JSON array.</td><td>none</td></tr>
 <tr><td><code>maxSegmentsInNodeLoadingQueue</code></td><td>The maximum number of segments that could be queued for loading to any given server. This parameter could be used to speed up segments loading process, especially if there are &quot;slow&quot; nodes in the cluster (with low loading speed) or if too much segments scheduled to be replicated to some particular node (faster loading could be preferred to better segments distribution). Desired value depends on segments loading speed, a [...]
-<tr><td><code>useRoundRobinSegmentAssignment</code></td><td>Boolean flag for whether segments should be assigned to historicals in a round robin fashion. When disabled, segment assignment is done using the chosen balancer strategy. When enabled, this can speed up segment assignments leaving balancing to move the segments to their optimal locations (based on the balancer strategy) lazily.</td><td>false</td></tr>
+<tr><td><code>useRoundRobinSegmentAssignment</code></td><td>Boolean flag for whether segments should be assigned to historicals in a round robin fashion. When disabled, segment assignment is done using the chosen balancer strategy. When enabled, this can speed up segment assignments leaving balancing to move the segments to their optimal locations (based on the balancer strategy) lazily.</td><td>true</td></tr>
 <tr><td><code>decommissioningNodes</code></td><td>List of historical servers to 'decommission'. Coordinator will not assign new segments to 'decommissioning' servers,  and segments will be moved away from them to be placed on non-decommissioning servers at the maximum rate specified by <code>decommissioningMaxPercentOfMaxSegmentsToMove</code>.</td><td>none</td></tr>
 <tr><td><code>decommissioningMaxPercentOfMaxSegmentsToMove</code></td><td>Upper limit of segments the Coordinator can move from decommissioning servers to active non-decommissioning servers during a single run. This value is relative to the total maximum number of segments that can be moved at any given time based upon the value of <code>maxSegmentsToMove</code>.<br /><br />If <code>decommissioningMaxPercentOfMaxSegmentsToMove</code> is 0, the Coordinator does not move segments to decomm [...]
 <tr><td><code>pauseCoordination</code></td><td>Boolean flag for whether or not the coordinator should execute its various duties of coordinating the cluster. Setting this to true essentially pauses all coordination work while allowing the API to remain up. Duties that are paused include all classes that implement the <code>CoordinatorDuty</code> Interface. Such duties include: Segment balancing, Segment compaction, Emission of metrics controlled by the dynamic coordinator config <code>em [...]
@@ -1248,7 +1248,7 @@ The below is a list of the supported configurations for auto-compaction.</p>
 <tr><td><code>druid.indexer.storage.type</code></td><td>Choices are &quot;local&quot; or &quot;metadata&quot;. Indicates whether incoming tasks should be stored locally (in heap) or in metadata storage. &quot;local&quot; is mainly for internal testing while &quot;metadata&quot; is recommended in production because storing incoming tasks in metadata storage allows for tasks to be resumed if the Overlord should fail.</td><td>local</td></tr>
 <tr><td><code>druid.indexer.storage.recentlyFinishedThreshold</code></td><td>Duration of time to store task results. Default is 24 hours. If you have hundreds of tasks running in a day, consider increasing this threshold.</td><td>PT24H</td></tr>
 <tr><td><code>druid.indexer.tasklock.forceTimeChunkLock</code></td><td><em><strong>Setting this to false is still experimental</strong></em><br/> If set, all tasks are enforced to use time chunk lock. If not set, each task automatically chooses a lock type to use. This configuration can be overwritten by setting <code>forceTimeChunkLock</code> in the <a href="/docs/latest/ingestion/tasks.html#context">task context</a>. See <a href="/docs/latest/ingestion/tasks.html#context">Task Locking  [...]
-<tr><td><code>druid.indexer.tasklock.batchSegmentAllocation</code></td><td>If set to true, Druid performs segment allocate actions in batches to improve throughput and reduce the average <code>task/action/run/time</code>. See <a href="/docs/latest/ingestion/tasks.html#batching-segmentallocate-actions">batching <code>segmentAllocate</code> actions</a> for details.</td><td>false</td></tr>
+<tr><td><code>druid.indexer.tasklock.batchSegmentAllocation</code></td><td>If set to true, Druid performs segment allocate actions in batches to improve throughput and reduce the average <code>task/action/run/time</code>. See <a href="/docs/latest/ingestion/tasks.html#batching-segmentallocate-actions">batching <code>segmentAllocate</code> actions</a> for details.</td><td>true</td></tr>
 <tr><td><code>druid.indexer.tasklock.batchAllocationWaitTime</code></td><td>Number of milliseconds after Druid adds the first segment allocate action to a batch, until it executes the batch. Allows the batch to add more requests and improve the average segment allocation run time. This configuration takes effect only if <code>batchSegmentAllocation</code> is enabled.</td><td>500</td></tr>
 <tr><td><code>druid.indexer.task.default.context</code></td><td>Default task context that is applied to all tasks submitted to the Overlord. Any default in this config does not override neither the context values the user provides nor <code>druid.indexer.tasklock.forceTimeChunkLock</code>.</td><td>empty context</td></tr>
 <tr><td><code>druid.indexer.queue.maxSize</code></td><td>Maximum number of active tasks at one time.</td><td>Integer.MAX_VALUE</td></tr>
@@ -1661,7 +1661,7 @@ ensure at least this amount of direct memory is available by providing <code>-XX
 <tr><td><code>druid.indexer.task.hadoopWorkingPath</code></td><td>Temporary working directory for Hadoop tasks.</td><td><code>/tmp/druid-indexing</code></td></tr>
 <tr><td><code>druid.indexer.task.restoreTasksOnRestart</code></td><td>If true, MiddleManagers will attempt to stop tasks gracefully on shutdown and restore them on restart.</td><td>false</td></tr>
 <tr><td><code>druid.indexer.task.ignoreTimestampSpecForDruidInputSource</code></td><td>If true, tasks using the <a href="/docs/latest/ingestion/native-batch-input-sources.html">Druid input source</a> will ignore the provided timestampSpec, and will use the <code>__time</code> column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.</td><td>false</td></tr>
-<tr><td><code>druid.indexer.task.storeEmptyColumns</code></td><td>Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the <a href="/docs/latest/ingestion/ingestion-spec.html#dimensionsspec"><code>dimensionsSpec</code></a>. If you use schemaless ingestion and don't specify any dimensions to ingest, you must also set <a href="/docs/latest/ingestion/ingestion-spec.html#dimensionsspec"><code>includeAllDimensions</ [...]
+<tr><td><code>druid.indexer.task.storeEmptyColumns</code></td><td>Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the <a href="/docs/latest/ingestion/ingestion-spec.html#dimensionsspec"><code>dimensionsSpec</code></a>. If you use the string-based schemaless ingestion and don't specify any dimensions to ingest, you must also set <a href="/docs/latest/ingestion/ingestion-spec.html#dimensionsspec"><code>inclu [...]
 <tr><td><code>druid.indexer.task.tmpStorageBytesPerTask</code></td><td>Maximum number of bytes per task to be used to store temporary files on disk. This usage is split among all temporary storage usages for the task. An exception might be thrown if this limit is too low for the task or if this limit would be exceeded. This limit is currently respected only by MSQ tasks. Other types of tasks might exceed this limit. A value of -1 disables this limit.</td><td>-1</td></tr>
 <tr><td><code>druid.indexer.server.maxChatRequests</code></td><td>Maximum number of concurrent requests served by a task's chat handler. Set to 0 to disable limiting.</td><td>0</td></tr>
 </tbody>
@@ -1735,7 +1735,7 @@ then the value from the configuration below is used:</p>
 <tr><td><code>druid.indexer.task.hadoopWorkingPath</code></td><td>Temporary working directory for Hadoop tasks.</td><td><code>/tmp/druid-indexing</code></td></tr>
 <tr><td><code>druid.indexer.task.restoreTasksOnRestart</code></td><td>If true, the Indexer will attempt to stop tasks gracefully on shutdown and restore them on restart.</td><td>false</td></tr>
 <tr><td><code>druid.indexer.task.ignoreTimestampSpecForDruidInputSource</code></td><td>If true, tasks using the <a href="/docs/latest/ingestion/native-batch-input-sources.html">Druid input source</a> will ignore the provided timestampSpec, and will use the <code>__time</code> column of the input datasource. This option is provided for compatibility with ingestion specs written before Druid 0.22.0.</td><td>false</td></tr>
-<tr><td><code>druid.indexer.task.storeEmptyColumns</code></td><td>Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the <a href="/docs/latest/ingestion/ingestion-spec.html#dimensionsspec"><code>dimensionsSpec</code></a>. If you use schemaless ingestion and don't specify any dimensions to ingest, you must also set <a href="/docs/latest/ingestion/ingestion-spec.html#dimensionsspec"><code>includeAllDimensions</ [...]
+<tr><td><code>druid.indexer.task.storeEmptyColumns</code></td><td>Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the <a href="/docs/latest/ingestion/ingestion-spec.html#dimensionsspec"><code>dimensionsSpec</code></a>. <br/><br/>If you set <code>storeEmptyColumns</code> to false, Druid SQL queries referencing empty columns will fail. If you intend to leave <code>storeEmptyColumns</code> disabled, you shoul [...]
 <tr><td><code>druid.peon.taskActionClient.retry.minWait</code></td><td>The minimum retry time to communicate with Overlord.</td><td>PT5S</td></tr>
 <tr><td><code>druid.peon.taskActionClient.retry.maxWait</code></td><td>The maximum retry time to communicate with Overlord.</td><td>PT1M</td></tr>
 <tr><td><code>druid.peon.taskActionClient.retry.maxRetryCount</code></td><td>The maximum number of retries to communicate with Overlord.</td><td>60</td></tr>
diff --git a/docs/latest/ingestion/ingestion-spec.html b/docs/latest/ingestion/ingestion-spec.html
index 2d73c4d6..104ad2fd 100644
--- a/docs/latest/ingestion/ingestion-spec.html
+++ b/docs/latest/ingestion/ingestion-spec.html
@@ -77,7 +77,7 @@
   ~ specific language governing permissions and limitations
   ~ under the License.
   -->
-<p>All ingestion methods use ingestion tasks to load data into Druid. Streaming ingestion uses ongoing supervisors that run and supervise a set of tasks over time. Native batch and Hadoop-based ingestion use a one-time <a href="/docs/latest/ingestion/tasks.html">task</a>. All types of ingestion use an <em>ingestion spec</em> to configure ingestion.</p>
+<p>All ingestion methods use ingestion tasks to load data into Druid. Streaming ingestion uses ongoing supervisors that run and supervise a set of tasks over time. Native batch and Hadoop-based ingestion use a one-time <a href="/docs/latest/ingestion/tasks.html">task</a>. Other than with SQL-based ingestion, use an <em>ingestion spec</em> to configure your ingestion.</p>
 <p>Ingestion specs consists of three main components:</p>
 <ul>
 <li><a href="#dataschema"><code>dataSchema</code></a>, which configures the <a href="#datasource">datasource name</a>,
@@ -223,15 +223,20 @@ your ingestion spec.</p>
 <p>Treat <code>__time</code> as a millisecond timestamp: the number of milliseconds since Jan 1, 1970 at midnight UTC.</p>
 <h3><a class="anchor" aria-hidden="true" id="dimensionsspec"></a><a href="#dimensionsspec" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0- [...]
 <p>The <code>dimensionsSpec</code> is located in <code>dataSchema</code> → <code>dimensionsSpec</code> and is responsible for
-configuring <a href="/docs/latest/ingestion/data-model.html#dimensions">dimensions</a>. An example <code>dimensionsSpec</code> is:</p>
-<pre><code class="hljs"><span class="hljs-string">"dimensionsSpec"</span> : {
-  <span class="hljs-string">"dimensions"</span>: [
-    <span class="hljs-string">"page"</span>,
-    <span class="hljs-string">"language"</span>,
-    { <span class="hljs-string">"type"</span>: <span class="hljs-string">"long"</span>, <span class="hljs-string">"name"</span>: <span class="hljs-string">"userId"</span> }
+configuring <a href="/docs/latest/ingestion/data-model.html#dimensions">dimensions</a>.</p>
+<p>You can either manually specify the dimensions or take advantage of schema auto-discovery where you allow Druid to infer all or some of the schema for your data. This means that you don't have to explicitly specify your dimensions and their type.</p>
+<p>To use schema auto-discovery, set <code>useSchemaDiscovery</code> to <code>true</code>.</p>
+<p>Alternatively, you can use the string-based schemaless ingestion where any discovered dimensions are treated as strings. To do so, leave <code>useSchemaDiscovery</code> set to <code>false</code> (default). Then, set the dimensions list to empty or set the  <code>includeAllDimensions</code> property to <code>true</code>.</p>
+<p>The following <code>dimensionsSpec</code> example uses schema auto-discovery (<code>&quot;useSchemaDiscovery&quot;: true</code>) in conjunction with explicitly defined dimensions to have Druid infer some of the schema for the data:</p>
+<pre><code class="hljs css language-json">"dimensionsSpec" : {
+  "dimensions": [
+    "page",
+    "language",
+    { "type": "long", "name": "userId" }
   ],
-  <span class="hljs-string">"dimensionExclusions"</span> : [],
-  <span class="hljs-string">"spatialDimensions"</span> : []
+  "dimensionExclusions" : [],
+  "spatialDimensions" : [],
+  "useSchemaDiscovery": true
 }
 </code></pre>
 <blockquote>
@@ -249,7 +254,8 @@ your ingestion spec.</p>
 <tr><td><code>dimensions</code></td><td>A list of <a href="#dimension-objects">dimension names or objects</a>. You cannot include the same column in both <code>dimensions</code> and <code>dimensionExclusions</code>.<br /><br />If <code>dimensions</code> and <code>spatialDimensions</code> are both null or empty arrays, Druid treats all columns other than timestamp or metrics that do not appear in <code>dimensionExclusions</code> as String-typed dimension columns. See <a href="#inclusions- [...]
 <tr><td><code>dimensionExclusions</code></td><td>The names of dimensions to exclude from ingestion. Only names are supported here, not objects.<br /><br />This list is only used if the <code>dimensions</code> and <code>spatialDimensions</code> lists are both null or empty arrays; otherwise it is ignored. See <a href="#inclusions-and-exclusions">inclusions and exclusions</a> below for details.</td><td><code>[]</code></td></tr>
 <tr><td><code>spatialDimensions</code></td><td>An array of <a href="/docs/latest/development/geo.html">spatial dimensions</a>.</td><td><code>[]</code></td></tr>
-<tr><td><code>includeAllDimensions</code></td><td>You can set <code>includeAllDimensions</code> to true to ingest both explicit dimensions in the <code>dimensions</code> field and other dimensions that the ingestion task discovers from input data. In this case, the explicit dimensions will appear first in order that you specify them and the dimensions dynamically discovered will come after. This flag can be useful especially with auto schema discovery using <a href="./data-formats.html#f [...]
+<tr><td><code>includeAllDimensions</code></td><td>Note that this field only applies to string-based schema discovery where Druid ingests dimensions it discovers as strings. This is different from schema auto-discovery where Druid infers the type for data. You can set <code>includeAllDimensions</code> to true to ingest both explicit dimensions in the <code>dimensions</code> field and other dimensions that the ingestion task discovers from input data. In this case, the explicit dimensions  [...]
+<tr><td><code>useSchemaDiscovery</code></td><td>Configure Druid to use schema auto-discovery to discover some or all of the dimensions and types for your data. For any dimensions that aren't a uniform type, Druid ingests them as JSON. You can use this for native batch or streaming ingestion.</td><td>false</td></tr>
 </tbody>
 </table>
 <h4><a class="anchor" aria-hidden="true" id="dimension-objects"></a><a href="#dimension-objects" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2- [...]
@@ -261,7 +267,7 @@ a <code>string</code> type dimension object with the given name, e.g. <code>&quo
 <tr><th>Field</th><th>Description</th><th>Default</th></tr>
 </thead>
 <tbody>
-<tr><td>type</td><td>Either <code>string</code>, <code>long</code>, <code>float</code>, <code>double</code>, or <code>json</code>.</td><td><code>string</code></td></tr>
+<tr><td>type</td><td>Either <code>auto</code>, <code>string</code>, <code>long</code>, <code>float</code>, <code>double</code>, or <code>json</code>. For the <code>auto</code> type, Druid determines the most appropriate type for the dimension and assigns one of the following: STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json> columns, all sharing a common 'nested' format. When Druid infers the schema with schema auto-discovery, the type is <code>auto</code> [...]
 <tr><td>name</td><td>The name of the dimension. This will be used as the field name to read from input records, as well as the column name stored in generated segments.<br /><br />Note that you can use a <a href="#transformspec"><code>transformSpec</code></a> if you want to rename columns during ingestion time.</td><td>none (required)</td></tr>
 <tr><td>createBitmapIndex</td><td>For <code>string</code> typed dimensions, whether or not bitmap indexes should be created for the column in generated segments. Creating a bitmap index requires more storage, but speeds up certain kinds of filtering (especially equality and prefix filtering). Only supported for <code>string</code> typed dimensions.</td><td><code>true</code></td></tr>
 <tr><td>multiValueHandling</td><td>Specify the type of handling for <a href="/docs/latest/querying/multi-value-dimensions.html">multi-value fields</a>. Possible values are <code>sorted_array</code>, <code>sorted_set</code>, and <code>array</code>. <code>sorted_array</code> and <code>sorted_set</code> order the array upon ingestion. <code>sorted_set</code> removes duplicates. <code>array</code> ingests data as-is</td><td><code>sorted_array</code></td></tr>
@@ -270,6 +276,9 @@ a <code>string</code> type dimension object with the given name, e.g. <code>&quo
 <h4><a class="anchor" aria-hidden="true" id="inclusions-and-exclusions"></a><a href="#inclusions-and-exclusions" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c [...]
 <p>Druid will interpret a <code>dimensionsSpec</code> in two possible ways: <em>normal</em> or <em>schemaless</em>.</p>
 <p>Normal interpretation occurs when either <code>dimensions</code> or <code>spatialDimensions</code> is non-empty. In this case, the combination of the two lists will be taken as the set of dimensions to be ingested, and the list of <code>dimensionExclusions</code> will be ignored.</p>
+<blockquote>
+<p>The following description of schemaless refers to  string-based schemaless  where Druid treats dimensions it discovers as strings. We recommend you use schema auto-discovery instead where Druid infers the type for the dimension. For more information, see <a href="#dimensionsspec"><code>dimensionsSpec</code></a>.</p>
+</blockquote>
 <p>Schemaless interpretation occurs when both <code>dimensions</code> and <code>spatialDimensions</code> are empty or null. In this case, the set of dimensions is determined in the following way:</p>
 <ol>
 <li>First, start from the set of all root-level fields from the input record, as determined by the <a href="/docs/latest/ingestion/data-formats.html"><code>inputFormat</code></a>. &quot;Root-level&quot; includes all fields at the top level of a data structure, but does not included fields nested within maps or lists. To extract these, you must use a <a href="/docs/latest/ingestion/data-formats.html#flattenspec"><code>flattenSpec</code></a>. All fields of non-nested data formats, such as  [...]
@@ -280,6 +289,7 @@ a <code>string</code> type dimension object with the given name, e.g. <code>&quo
 <li>Any field with the same name as an aggregator from the <a href="#metricsspec">metricsSpec</a> is excluded.</li>
 <li>All other fields are ingested as <code>string</code> typed dimensions with the <a href="#dimension-objects">default settings</a>.</li>
 </ol>
+<p>Additionally, if you have empty columns that you want to include in the string-based schemaless ingestion, you'll need to include the context parameter <code>storeEmptyColumns</code> and set it to <code>true</code>.</p>
 <blockquote>
 <p>Note: Fields generated by a <a href="#transformspec"><code>transformSpec</code></a> are not currently considered candidates for
 schemaless dimension interpretation.</p>
diff --git a/docs/latest/ingestion/schema-design.html b/docs/latest/ingestion/schema-design.html
index 0fb7fb65..2405475f 100644
--- a/docs/latest/ingestion/schema-design.html
+++ b/docs/latest/ingestion/schema-design.html
@@ -150,7 +150,7 @@ to compute percentiles or quantiles, use Druid's <a href="/docs/latest/querying/
 row in your Druid datasource. This can be useful if you want to store data at a different time granularity than it is
 naturally emitted. It is also useful if you want to combine timeseries and non-timeseries data in the same datasource.</li>
 <li>If you don't know ahead of time what columns you'll want to ingest, use an empty dimensions list to trigger
-<a href="#schema-less-dimensions">automatic detection of dimension columns</a>.</li>
+<a href="#schema-auto-discovery-for-dimensions">automatic detection of dimension columns</a>.</li>
 </ul>
 <h3><a class="anchor" aria-hidden="true" id="log-aggregation-model"></a><a href="#log-aggregation-model" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2 [...]
 <p>(Like Elasticsearch or Splunk.)</p>
@@ -160,8 +160,7 @@ developed. The main data modeling differences between Druid and these systems ar
 you must be more explicit. Druid columns have types specific upfront.</p>
 <p>Tips for modeling log data in Druid:</p>
 <ul>
-<li>If you don't know ahead of time what columns you'll want to ingest, use an empty dimensions list to trigger
-<a href="#schema-less-dimensions">automatic detection of dimension columns</a>.</li>
+<li>If you don't know ahead of time what columns to ingest, you can have Druid perform <a href="#schema-auto-discovery-for-dimensions">schema auto-discovery</a>.</li>
 <li>If you have nested data, you can ingest it using the <a href="/docs/latest/querying/nested-columns.html">nested columns</a> feature or flatten it using a <a href="/docs/latest/ingestion/ingestion-spec.html#flattenspec"><code>flattenSpec</code></a>.</li>
 <li>Consider enabling <a href="/docs/latest/ingestion/rollup.html">rollup</a> if you have mainly analytical use cases for your log data. This will
 mean you lose the ability to retrieve individual events from Druid, but you potentially gain substantial compression and
@@ -241,10 +240,47 @@ the number of Druid rows for the time interval, which can be used to determine w
     { "type": "longSum", "name": "numIngestedEvents", "fieldName": "count" }
 ]
 </code></pre>
-<h3><a class="anchor" aria-hidden="true" id="schema-less-dimensions"></a><a href="#schema-less-dimensions" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0 [...]
-<p>If the <code>dimensions</code> field is left empty in your ingestion spec, Druid will treat every column that is not the timestamp column,
-a dimension that has been excluded, or a metric column as a dimension.</p>
-<p>Note that when using schema-less ingestion, all dimensions will be ingested as String-typed dimensions.</p>
+<h3><a class="anchor" aria-hidden="true" id="schema-auto-discovery-for-dimensions"></a><a href="#schema-auto-discovery-for-dimensions" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 [...]
+<p>Druid can infer the schema for your data in one of two ways:</p>
+<ul>
+<li><a href="#type-aware-schema-discovery">Type-aware schema discovery (experimental)</a> where Druid infers the schema and type for your data. Type-aware schema discovery is an experimental feature currently available for native batch and streaming ingestion.</li>
+<li><a href="#string-based-schema-discovery">String-based schema discovery</a> where all the discovered columns are typed as either native string or multi-value string columns.</li>
+</ul>
+<h4><a class="anchor" aria-hidden="true" id="type-aware-schema-discovery"></a><a href="#type-aware-schema-discovery" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 1 [...]
+<blockquote>
+<p>Note that using type-aware schema discovery can impact downstream BI tools depending on how they handle ARRAY typed columns.</p>
+</blockquote>
+<p>You can have Druid infer the schema and types for your data partially or fully by setting <code>dimensionsSpec.useSchemaDiscovery</code> to <code>true</code> and defining some or no dimensions in the dimensions list.</p>
+<p>When performing type-aware schema discovery, Druid can discover all of the columns of your input data (that aren't in
+the exclusion list). Druid automatically chooses the most appropriate native Druid type among <code>STRING</code>, <code>LONG</code>,
+<code>DOUBLE</code>, <code>ARRAY&lt;STRING&gt;</code>, <code>ARRAY&lt;LONG&gt;</code>, <code>ARRAY&lt;DOUBLE&gt;</code>, or <code>COMPLEX&lt;json&gt;</code> for nested data. For input formats with
+native boolean types, Druid ingests these values as strings if <code>druid.expressions.useStrictBooleans</code> is set to <code>false</code>
+(the default), or longs if set to <code>true</code> (for more SQL compatible behavior). Array typed columns can be queried using
+the <a href="/docs/latest/querying/sql-array-functions.html">array functions</a> or <a href="/docs/latest/querying/sql-functions.html#unnest">UNNEST</a>. Nested
+columns can be queried with the <a href="/docs/latest/querying/sql-json-functions.html">JSON functions</a>.</p>
+<p>Mixed type columns are stored in the <em>least</em> restrictive type that can represent all values in the column. For example:</p>
+<ul>
+<li>Mixed numeric columns are <code>DOUBLE</code></li>
+<li>If there are any strings present, then the column is a <code>STRING</code></li>
+<li>If there are arrays, then the column becomes an array with the least restrictive element type</li>
+<li>Any nested data or arrays of nested data become <code>COMPLEX&lt;json&gt;</code> nested columns.</li>
+</ul>
+<p>If you're already using string-based schema discovery and want to migrate, see <a href="#migrating-to-type-aware-schema-discovery">Migrating to type-aware schema discovery</a>.</p>
+<h4><a class="anchor" aria-hidden="true" id="string-based-schema-discovery"></a><a href="#string-based-schema-discovery" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12  [...]
+<p>If you do not set <code>dimensionsSpec.useSchemaDiscovery</code> to <code>true</code>, Druid can still use the string-based schema discovery for ingestion if any of the following conditions are met:</p>
+<ul>
+<li>The dimension list is empty</li>
+<li>You set <code>includeAllDimensions</code> to <code>true</code></li>
+</ul>
+<p>Druid coerces primitives and arrays of primitive types into the native Druid string type. Nested data structures and arrays of nested data structures are ignored and not ingested.</p>
+<h4><a class="anchor" aria-hidden="true" id="migrating-to-type-aware-schema-discovery"></a><a href="#migrating-to-type-aware-schema-discovery" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0  [...]
+<p>If you previously used string-based schema discovery and want to migrate to type-aware schema discovery, do the following:</p>
+<ul>
+<li>Update any queries that use multi-value dimensions (MVDs) to use UNNEST in conjunction with other functions so that no MVD behavior is being relied upon. Type-aware schema discovery generates ARRAY typed columns instead of MVDs, so queries that use any MVD features will fail.</li>
+<li>Be aware of mixed typed inputs and test how type-aware schema discovery handles them. Druid attempts to cast them as the least restrictive type.</li>
+<li>If you notice issues with numeric types, you may need to explicitly cast them. Generally, Druid handles the coercion for you.</li>
+<li>Update your dimension exclusion list and add any nested columns if you want to continue to exclude them. String-based schema discovery automatically ignores nested columns, but type-aware schema discovery will ingest them.</li>
+</ul>
 <h3><a class="anchor" aria-hidden="true" id="including-the-same-column-as-a-dimension-and-a-metric"></a><a href="#including-the-same-column-as-a-dimension-and-a-metric" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2. [...]
 <p>One workflow with unique IDs is to be able to filter on a particular ID, while still being able to do fast unique counts on the ID column.
 If you are not using schema-less dimensions, this use case is supported by setting the <code>name</code> of the metric to something different than the dimension.
@@ -257,7 +293,7 @@ some work at ETL time.</p>
 <pre><code class="hljs css language-json">{ <span class="hljs-attr">"type"</span>: <span class="hljs-string">"hyperUnique"</span>, <span class="hljs-attr">"name"</span>: <span class="hljs-string">"devices"</span>, <span class="hljs-attr">"fieldName"</span>: <span class="hljs-string">"device_id_met"</span> }
 </code></pre>
 <p><code>device_id_dim</code> should automatically get picked up as a dimension.</p>
-</span></div></article></div><div class="docs-prevnext"><a class="docs-prev button" href="/docs/latest/ingestion/ingestion-spec.html"><span class="arrow-prev">← </span><span>Ingestion spec</span></a><a class="docs-next button" href="/docs/latest/development/extensions-core/kafka-ingestion.html"><span>Apache Kafka ingestion</span><span class="arrow-next"> →</span></a></div></div></div><nav class="onPageNav"><ul class="toc-headings"><li><a href="#druids-data-model">Druid's data model</a></ [...]
+</span></div></article></div><div class="docs-prevnext"><a class="docs-prev button" href="/docs/latest/ingestion/ingestion-spec.html"><span class="arrow-prev">← </span><span>Ingestion spec</span></a><a class="docs-next button" href="/docs/latest/development/extensions-core/kafka-ingestion.html"><span>Apache Kafka ingestion</span><span class="arrow-next"> →</span></a></div></div></div><nav class="onPageNav"><ul class="toc-headings"><li><a href="#druids-data-model">Druid's data model</a></ [...]
                 document.addEventListener('keyup', function(e) {
                   if (e.target !== document.body) {
                     return;
diff --git a/docs/latest/ingestion/tasks.html b/docs/latest/ingestion/tasks.html
index 19ad2805..3b592417 100644
--- a/docs/latest/ingestion/tasks.html
+++ b/docs/latest/ingestion/tasks.html
@@ -387,7 +387,7 @@ The settings get passed into the <code>context</code> field of the compaction ta
 <tr><td><code>forceTimeChunkLock</code></td><td>true</td><td><em>Setting this to false is still experimental</em><br/> Force to always use time chunk lock. If not set, each task automatically chooses a lock type to use. If set, this parameter overwrites <code>druid.indexer.tasklock.forceTimeChunkLock</code> <a href="/docs/latest/configuration/index.html#overlord-operations">configuration for the overlord</a>. See <a href="#locking">Locking</a> for more details.</td></tr>
 <tr><td><code>priority</code></td><td>Different based on task types. See <a href="#priority">Priority</a>.</td><td>Task priority</td></tr>
 <tr><td><code>useLineageBasedSegmentAllocation</code></td><td>false in 0.21 or earlier, true in 0.22 or later</td><td>Enable the new lineage-based segment allocation protocol for the native Parallel task with dynamic partitioning. This option should be off during the replacing rolling upgrade from one of the Druid versions between 0.19 and 0.21 to Druid 0.22 or higher. Once the upgrade is done, it must be set to true to ensure data correctness.</td></tr>
-<tr><td><code>storeEmptyColumns</code></td><td>true</td><td>Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the <a href="/docs/latest/ingestion/ingestion-spec.html#dimensionsspec"><code>dimensionsSpec</code></a>. If you use schemaless ingestion and don't specify any dimensions to ingest, you must also set <a href="/docs/latest/ingestion/ingestion-spec.html#dimensionsspec"><code>includeAllDimensions</code>< [...]
+<tr><td><code>storeEmptyColumns</code></td><td>true</td><td>Boolean value for whether or not to store empty columns during ingestion. When set to true, Druid stores every column specified in the <a href="/docs/latest/ingestion/ingestion-spec.html#dimensionsspec"><code>dimensionsSpec</code></a>. <br/><br/>If you set <code>storeEmptyColumns</code> to false, Druid SQL queries referencing empty columns will fail. If you intend to leave <code>storeEmptyColumns</code> disabled, you should eith [...]
 </tbody>
 </table>
 <h2><a class="anchor" aria-hidden="true" id="task-logs"></a><a href="#task-logs" aria-hidden="true" class="hash-link"><svg class="hash-link-icon" aria-hidden="true" height="16" version="1.1" viewBox="0 0 16 16" width="16"><path fill-rule="evenodd" d="M4 9h1v1H4c-1.5 0-3-1.69-3-3.5S2.55 3 4 3h4c1.45 0 3 1.69 3 3.5 0 1.41-.91 2.72-2 3.25V8.59c.58-.45 1-1.27 1-2.09C10 5.22 8.98 4 8 4H4c-.98 0-2 1.22-2 2.5S3 9 4 9zm9-3h-1v1h1c1 0 2 1.22 2 2.5S13.98 12 13 12H9c-.98 0-2-1.22-2-2.5 0-.83.42-1.6 [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org