You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/24 16:45:54 UTC

[GitHub] [druid] sthetland opened a new pull request #9766: Tutorial screen and light text updates

sthetland opened a new pull request #9766:
URL: https://github.com/apache/druid/pull/9766


   Druid Quickstart refactor and update.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] weishiuntsai commented on a change in pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
weishiuntsai commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r414966246



##########
File path: docs/tutorials/index.md
##########
@@ -99,96 +91,173 @@ $ ./bin/start-micro-quickstart
 [Fri May  3 11:40:50 2019] Running command[middleManager], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/middleManager.log]: bin/run-druid middleManager conf/druid/single-server/micro-quickstart
 ```
 
-All persistent state such as the cluster metadata store and segments for the services will be kept in the `var` directory under the apache-druid-{{DRUIDVERSION}} package root. Logs for the services are located at `var/sv`.
+All persistent state, such as the cluster metadata store and segments for the services, are kept in the `var` directory under 
+the Druid root directory, apache-druid-{{DRUIDVERSION}}. Each service writes to a log file under `var/sv`, as noted in the startup script output above.
+
+At any time, you can revert Druid to its original, post-installation state by deleting the entire `var` directory. You may
+want to do this, for example, between Druid tutorials or after experimentation, to start with a fresh instance. 
+
+To stop Druid at any time, use CTRL-C in the terminal. This exits the `bin/start-micro-quickstart` script and 
+terminates all Druid processes. 
+
 
-Later on, if you'd like to stop the services, CTRL-C to exit the `bin/start-micro-quickstart` script, which will terminate the Druid processes.
+## Step 3. Open the Druid console 
 
-Once the cluster has started, you can navigate to [http://localhost:8888](http://localhost:8888).
-The [Druid router process](../design/router.md), which serves the [Druid console](../operations/druid-console.md), resides at this address.
+After the Druid services finish startup, open the [Druid console](../operations/druid-console.md) at [http://localhost:8888](http://localhost:8888). 
 
 ![Druid console](../assets/tutorial-quickstart-01.png "Druid console")
 
-It takes a few seconds for all the Druid processes to fully start up. If you open the console immediately after starting the services, you may see some errors that you can safely ignore.
-
-
-## Loading data
-
-### Tutorial dataset
-
-For the following data loading tutorials, we have included a sample data file containing Wikipedia page edit events that occurred on 2015-09-12.
-
-This sample data is located at `quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid package root.
-The page edit events are stored as JSON objects in a text file.
-
-The sample data has the following columns, and an example event is shown below:
-
-  * added
-  * channel
-  * cityName
-  * comment
-  * countryIsoCode
-  * countryName
-  * deleted
-  * delta
-  * isAnonymous
-  * isMinor
-  * isNew
-  * isRobot
-  * isUnpatrolled
-  * metroCode
-  * namespace
-  * page
-  * regionIsoCode
-  * regionName
-  * user
-
-```json
-{
-  "timestamp":"2015-09-12T20:03:45.018Z",
-  "channel":"#en.wikipedia",
-  "namespace":"Main",
-  "page":"Spider-Man's powers and equipment",
-  "user":"foobar",
-  "comment":"/* Artificial web-shooters */",
-  "cityName":"New York",
-  "regionName":"New York",
-  "regionIsoCode":"NY",
-  "countryName":"United States",
-  "countryIsoCode":"US",
-  "isAnonymous":false,
-  "isNew":false,
-  "isMinor":false,
-  "isRobot":false,
-  "isUnpatrolled":false,
-  "added":99,
-  "delta":99,
-  "deleted":0,
-}
-```
+It may take a few seconds for all Druid services to finish starting, including the [Druid router](../design/router.md), which serves the console. If you attempt to open the Druid console before startup is complete, you may see errors in the browser. Wait a few moments and try again. 
 
 
-### Data loading tutorials
+## Step 4. Load data
 
-The following tutorials demonstrate various methods of loading data into Druid, including both batch and streaming use cases.
-All tutorials assume that you are using the `micro-quickstart` single-machine configuration mentioned above.
 
-- [Loading a file](./tutorial-batch.md) - this tutorial demonstrates how to perform a batch file load, using Druid's native batch ingestion.
-- [Loading stream data from Apache Kafka](./tutorial-kafka.md) - this tutorial demonstrates how to load streaming data from a Kafka topic.
-- [Loading a file using Apache Hadoop](./tutorial-batch-hadoop.md) - this tutorial demonstrates how to perform a batch file load, using a remote Hadoop cluster.
-- [Writing your own ingestion spec](./tutorial-ingestion-spec.md) - this tutorial demonstrates how to write a new ingestion spec and use it to load data.
+Ingestion specs define the schema of the data Druid reads and stores. You can write ingestion specs by hand or using the _data loader_, 
+as we will do here. 
 
-### Resetting cluster state
+For this tutorial, we'll load sample data bundled with Druid that represents Wikipedia page edits on a given day. 
 
-If you want a clean start after stopping the services, delete the `var` directory and run the `bin/start-micro-quickstart` script again.
+1. Click **Load data** from the Druid console header (![Load data](../assets/tutorial-batch-data-loader-00.png)).
 
-Once every service has started, you are now ready to load data.
+2. Select the **Local disk** tile and then click **Connect data**.
 
-#### Resetting Kafka
+   ![Data loader init](../assets/tutorial-batch-data-loader-01.png "Data loader init")
+
+3. Enter the following values: 
+
+   - **Base directory**: `quickstart/tutorial/`
+
+   - **File filter**: `wikiticker-2015-09-12-sampled.json.gz` 
+
+   ![Data location](../assets/tutorial-batch-data-loader-015.png "Data location")
+
+   Entering the base directory and [wildcard file filter](https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html) separately, as afforded by the UI, allows you to specify multiple files for ingestion at once.
+
+4. Click **Apply**. 
+
+   The data loader displays the raw data, giving you a chance to verify that the data 
+   appears as expected. 
+
+   ![Data loader sample](../assets/tutorial-batch-data-loader-02.png "Data loader sample")
+
+   Notice that your position in the sequence of steps to load data, **Connect** in our case, appears at the top of the console, as shown below. 
+   You can click other steps to move forward or backward in the sequence at any time.
+   
+   ![Load data](../assets/tutorial-batch-data-loader-12.png)  
+   
+
+5. Click **Next: Parse data**. 
+
+   The data loader tries to determine the parser appropriate for the data format automatically. In this case 
+   it identifies the data format as `json`, as shown in the **Input format** field at the bottom right.
+
+   ![Data loader parse data](../assets/tutorial-batch-data-loader-03.png "Data loader parse data")
+
+   Feel free to select other **Input format** options to get a sense of their configuration settings 
+   and how Druid parses other types of data.  
+
+6. With the JSON parser selected, click **Next: Parse time**. The **Parse time** settings are where you view and adjust the 
+   primary timestamp column for the data.
+
+   ![Data loader parse time](../assets/tutorial-batch-data-loader-04.png "Data loader parse time")
+
+   Druid requires data to have a primary timestamp column (internally stored in a column called `__time`).
+   If you do not have a timestamp in your data, select `Constant value`. In our example, the data loader 
+   determines that the `time` column is the only candidate that can be used as the primary time column.
+
+7. Click **Next: Transform**, **Next: Filter**, and then **Next: Configure schema**, skipping a few steps.
+
+   You do not need to adjust transformation or filtering settings, as applying ingestion time transforms and 
+   filters are out of scope for this tutorial.
+
+8. The Configure schema settings are where you configure what [dimensions](../ingestion/index.md#dimensions) 
+   and [metrics](../ingestion/index.md#metrics) are ingested. The outcome of this configuration represents exactly how the 
+   data will appear in Druid after ingestion. 
+
+   Since our dataset is very small, you can turn off [rollup](../ingestion/index.md#rollup) 
+   by unsetting the **Rollup** switch and confirming the change when prompted.
+
+   ![Data loader schema](../assets/tutorial-batch-data-loader-05.png "Data loader schema")
+
+
+10. Click **Next: Partition** to configure how the data will be split into segments. In this case, choose `DAY` as 
+    the **Segment Granularity**. 

Review comment:
       "Segment Granularity" should be "Segment granularity"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] weishiuntsai commented on a change in pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
weishiuntsai commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r414964427



##########
File path: docs/tutorials/index.md
##########
@@ -99,96 +91,173 @@ $ ./bin/start-micro-quickstart
 [Fri May  3 11:40:50 2019] Running command[middleManager], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/middleManager.log]: bin/run-druid middleManager conf/druid/single-server/micro-quickstart
 ```
 
-All persistent state such as the cluster metadata store and segments for the services will be kept in the `var` directory under the apache-druid-{{DRUIDVERSION}} package root. Logs for the services are located at `var/sv`.
+All persistent state, such as the cluster metadata store and segments for the services, are kept in the `var` directory under 
+the Druid root directory, apache-druid-{{DRUIDVERSION}}. Each service writes to a log file under `var/sv`, as noted in the startup script output above.
+
+At any time, you can revert Druid to its original, post-installation state by deleting the entire `var` directory. You may
+want to do this, for example, between Druid tutorials or after experimentation, to start with a fresh instance. 
+
+To stop Druid at any time, use CTRL-C in the terminal. This exits the `bin/start-micro-quickstart` script and 
+terminates all Druid processes. 
+
 
-Later on, if you'd like to stop the services, CTRL-C to exit the `bin/start-micro-quickstart` script, which will terminate the Druid processes.
+## Step 3. Open the Druid console 
 
-Once the cluster has started, you can navigate to [http://localhost:8888](http://localhost:8888).
-The [Druid router process](../design/router.md), which serves the [Druid console](../operations/druid-console.md), resides at this address.
+After the Druid services finish startup, open the [Druid console](../operations/druid-console.md) at [http://localhost:8888](http://localhost:8888). 
 
 ![Druid console](../assets/tutorial-quickstart-01.png "Druid console")
 
-It takes a few seconds for all the Druid processes to fully start up. If you open the console immediately after starting the services, you may see some errors that you can safely ignore.
-
-
-## Loading data
-
-### Tutorial dataset
-
-For the following data loading tutorials, we have included a sample data file containing Wikipedia page edit events that occurred on 2015-09-12.
-
-This sample data is located at `quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid package root.
-The page edit events are stored as JSON objects in a text file.
-
-The sample data has the following columns, and an example event is shown below:
-
-  * added
-  * channel
-  * cityName
-  * comment
-  * countryIsoCode
-  * countryName
-  * deleted
-  * delta
-  * isAnonymous
-  * isMinor
-  * isNew
-  * isRobot
-  * isUnpatrolled
-  * metroCode
-  * namespace
-  * page
-  * regionIsoCode
-  * regionName
-  * user
-
-```json
-{
-  "timestamp":"2015-09-12T20:03:45.018Z",
-  "channel":"#en.wikipedia",
-  "namespace":"Main",
-  "page":"Spider-Man's powers and equipment",
-  "user":"foobar",
-  "comment":"/* Artificial web-shooters */",
-  "cityName":"New York",
-  "regionName":"New York",
-  "regionIsoCode":"NY",
-  "countryName":"United States",
-  "countryIsoCode":"US",
-  "isAnonymous":false,
-  "isNew":false,
-  "isMinor":false,
-  "isRobot":false,
-  "isUnpatrolled":false,
-  "added":99,
-  "delta":99,
-  "deleted":0,
-}
-```
+It may take a few seconds for all Druid services to finish starting, including the [Druid router](../design/router.md), which serves the console. If you attempt to open the Druid console before startup is complete, you may see errors in the browser. Wait a few moments and try again. 
 
 
-### Data loading tutorials
+## Step 4. Load data
 
-The following tutorials demonstrate various methods of loading data into Druid, including both batch and streaming use cases.
-All tutorials assume that you are using the `micro-quickstart` single-machine configuration mentioned above.
 
-- [Loading a file](./tutorial-batch.md) - this tutorial demonstrates how to perform a batch file load, using Druid's native batch ingestion.
-- [Loading stream data from Apache Kafka](./tutorial-kafka.md) - this tutorial demonstrates how to load streaming data from a Kafka topic.
-- [Loading a file using Apache Hadoop](./tutorial-batch-hadoop.md) - this tutorial demonstrates how to perform a batch file load, using a remote Hadoop cluster.
-- [Writing your own ingestion spec](./tutorial-ingestion-spec.md) - this tutorial demonstrates how to write a new ingestion spec and use it to load data.
+Ingestion specs define the schema of the data Druid reads and stores. You can write ingestion specs by hand or using the _data loader_, 
+as we will do here. 
 
-### Resetting cluster state
+For this tutorial, we'll load sample data bundled with Druid that represents Wikipedia page edits on a given day. 

Review comment:
       I think it's worthwhile to mention the data file that we will be loading here.  The original version has this part "This sample data is located at quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz from the Druid package root.".  That makes it a bit more clear before diving into base directory and file filer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] weishiuntsai commented on a change in pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
weishiuntsai commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r414962574



##########
File path: docs/tutorials/index.md
##########
@@ -23,71 +23,63 @@ title: "Quickstart"
   -->
 
 
-In this quickstart, we will download Druid and set it up on a single machine. The cluster will be ready to load data
-after completing this initial setup.
+This quickstart gets you started with Apache Druid and introduces you to some of its basic features. 
+Following these steps, you will install Druid and load sample 
+data using its native batch ingestion feature. 
 
-Before beginning the quickstart, it is helpful to read the [general Druid overview](../design/index.md) and the
-[ingestion overview](../ingestion/index.md), as the tutorials will refer to concepts discussed on those pages.
+Before starting, you may want to read the [general Druid overview](../design/index.md) and
+[ingestion overview](../ingestion/index.md), as the tutorials refer to concepts discussed on those pages.
 
-## Prerequisites
+## Requirements
 
-### Software
+You can follow these steps on a relatively small machine, such as a laptop with around 4 CPU and 16 GB of RAM. 
 
-You will need:
+Druid comes with several startup configuration profiles for a range of machine sizes. 
+The `micro-quickstart`configuration profile shown here is suitable for early evaluation scenarios. To explore 
+Druid's performance or scaling capabilities, you'll need a larger machine.
 
-* **Java 8 (8u92+) or later**
-* Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
-
-> **Warning:** Druid only officially supports Java 8. Any Java version later than 8 is still experimental.
->
-> If needed, you can specify where to find Java using the environment variables `DRUID_JAVA_HOME` or `JAVA_HOME`. For more details run the verify-java script.
+The configuration profiles included with Druid range from the even smaller _Nano-Quickstart_ configuration (1 CPU, 4GB RAM) 
+to the _X-large_ configuration (64 CPU, 512GB RAM). For more information, see 

Review comment:
       "X-Large", instead of "X-large" perhaps, to be consistent with "Nano-Quickstart".  In single-server.md, it uses "nano-quickstart", "xlarge" at the beginning, and "Nano-Quickstart", "X-Large" at the end.  If "Nano-Quickstart" is used here, I think "X-Large" should go with it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] weishiuntsai commented on a change in pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
weishiuntsai commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r414964427



##########
File path: docs/tutorials/index.md
##########
@@ -99,96 +91,173 @@ $ ./bin/start-micro-quickstart
 [Fri May  3 11:40:50 2019] Running command[middleManager], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/middleManager.log]: bin/run-druid middleManager conf/druid/single-server/micro-quickstart
 ```
 
-All persistent state such as the cluster metadata store and segments for the services will be kept in the `var` directory under the apache-druid-{{DRUIDVERSION}} package root. Logs for the services are located at `var/sv`.
+All persistent state, such as the cluster metadata store and segments for the services, are kept in the `var` directory under 
+the Druid root directory, apache-druid-{{DRUIDVERSION}}. Each service writes to a log file under `var/sv`, as noted in the startup script output above.
+
+At any time, you can revert Druid to its original, post-installation state by deleting the entire `var` directory. You may
+want to do this, for example, between Druid tutorials or after experimentation, to start with a fresh instance. 
+
+To stop Druid at any time, use CTRL-C in the terminal. This exits the `bin/start-micro-quickstart` script and 
+terminates all Druid processes. 
+
 
-Later on, if you'd like to stop the services, CTRL-C to exit the `bin/start-micro-quickstart` script, which will terminate the Druid processes.
+## Step 3. Open the Druid console 
 
-Once the cluster has started, you can navigate to [http://localhost:8888](http://localhost:8888).
-The [Druid router process](../design/router.md), which serves the [Druid console](../operations/druid-console.md), resides at this address.
+After the Druid services finish startup, open the [Druid console](../operations/druid-console.md) at [http://localhost:8888](http://localhost:8888). 
 
 ![Druid console](../assets/tutorial-quickstart-01.png "Druid console")
 
-It takes a few seconds for all the Druid processes to fully start up. If you open the console immediately after starting the services, you may see some errors that you can safely ignore.
-
-
-## Loading data
-
-### Tutorial dataset
-
-For the following data loading tutorials, we have included a sample data file containing Wikipedia page edit events that occurred on 2015-09-12.
-
-This sample data is located at `quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid package root.
-The page edit events are stored as JSON objects in a text file.
-
-The sample data has the following columns, and an example event is shown below:
-
-  * added
-  * channel
-  * cityName
-  * comment
-  * countryIsoCode
-  * countryName
-  * deleted
-  * delta
-  * isAnonymous
-  * isMinor
-  * isNew
-  * isRobot
-  * isUnpatrolled
-  * metroCode
-  * namespace
-  * page
-  * regionIsoCode
-  * regionName
-  * user
-
-```json
-{
-  "timestamp":"2015-09-12T20:03:45.018Z",
-  "channel":"#en.wikipedia",
-  "namespace":"Main",
-  "page":"Spider-Man's powers and equipment",
-  "user":"foobar",
-  "comment":"/* Artificial web-shooters */",
-  "cityName":"New York",
-  "regionName":"New York",
-  "regionIsoCode":"NY",
-  "countryName":"United States",
-  "countryIsoCode":"US",
-  "isAnonymous":false,
-  "isNew":false,
-  "isMinor":false,
-  "isRobot":false,
-  "isUnpatrolled":false,
-  "added":99,
-  "delta":99,
-  "deleted":0,
-}
-```
+It may take a few seconds for all Druid services to finish starting, including the [Druid router](../design/router.md), which serves the console. If you attempt to open the Druid console before startup is complete, you may see errors in the browser. Wait a few moments and try again. 
 
 
-### Data loading tutorials
+## Step 4. Load data
 
-The following tutorials demonstrate various methods of loading data into Druid, including both batch and streaming use cases.
-All tutorials assume that you are using the `micro-quickstart` single-machine configuration mentioned above.
 
-- [Loading a file](./tutorial-batch.md) - this tutorial demonstrates how to perform a batch file load, using Druid's native batch ingestion.
-- [Loading stream data from Apache Kafka](./tutorial-kafka.md) - this tutorial demonstrates how to load streaming data from a Kafka topic.
-- [Loading a file using Apache Hadoop](./tutorial-batch-hadoop.md) - this tutorial demonstrates how to perform a batch file load, using a remote Hadoop cluster.
-- [Writing your own ingestion spec](./tutorial-ingestion-spec.md) - this tutorial demonstrates how to write a new ingestion spec and use it to load data.
+Ingestion specs define the schema of the data Druid reads and stores. You can write ingestion specs by hand or using the _data loader_, 
+as we will do here. 
 
-### Resetting cluster state
+For this tutorial, we'll load sample data bundled with Druid that represents Wikipedia page edits on a given day. 

Review comment:
       I think it's worthwhile to mention the data file that we will be loading here.  The original version has this part "This sample data is located at quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz from the Druid package root.".  That makes it a bit more clear before diving into base directory and file filer.  I agree that the part talking about columns can go.  I felt it was a bit too much even when I first read it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] weishiuntsai commented on a change in pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
weishiuntsai commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r414965988



##########
File path: docs/tutorials/index.md
##########
@@ -99,96 +91,173 @@ $ ./bin/start-micro-quickstart
 [Fri May  3 11:40:50 2019] Running command[middleManager], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/middleManager.log]: bin/run-druid middleManager conf/druid/single-server/micro-quickstart
 ```
 
-All persistent state such as the cluster metadata store and segments for the services will be kept in the `var` directory under the apache-druid-{{DRUIDVERSION}} package root. Logs for the services are located at `var/sv`.
+All persistent state, such as the cluster metadata store and segments for the services, are kept in the `var` directory under 
+the Druid root directory, apache-druid-{{DRUIDVERSION}}. Each service writes to a log file under `var/sv`, as noted in the startup script output above.
+
+At any time, you can revert Druid to its original, post-installation state by deleting the entire `var` directory. You may
+want to do this, for example, between Druid tutorials or after experimentation, to start with a fresh instance. 
+
+To stop Druid at any time, use CTRL-C in the terminal. This exits the `bin/start-micro-quickstart` script and 
+terminates all Druid processes. 
+
 
-Later on, if you'd like to stop the services, CTRL-C to exit the `bin/start-micro-quickstart` script, which will terminate the Druid processes.
+## Step 3. Open the Druid console 
 
-Once the cluster has started, you can navigate to [http://localhost:8888](http://localhost:8888).
-The [Druid router process](../design/router.md), which serves the [Druid console](../operations/druid-console.md), resides at this address.
+After the Druid services finish startup, open the [Druid console](../operations/druid-console.md) at [http://localhost:8888](http://localhost:8888). 
 
 ![Druid console](../assets/tutorial-quickstart-01.png "Druid console")
 
-It takes a few seconds for all the Druid processes to fully start up. If you open the console immediately after starting the services, you may see some errors that you can safely ignore.
-
-
-## Loading data
-
-### Tutorial dataset
-
-For the following data loading tutorials, we have included a sample data file containing Wikipedia page edit events that occurred on 2015-09-12.
-
-This sample data is located at `quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid package root.
-The page edit events are stored as JSON objects in a text file.
-
-The sample data has the following columns, and an example event is shown below:
-
-  * added
-  * channel
-  * cityName
-  * comment
-  * countryIsoCode
-  * countryName
-  * deleted
-  * delta
-  * isAnonymous
-  * isMinor
-  * isNew
-  * isRobot
-  * isUnpatrolled
-  * metroCode
-  * namespace
-  * page
-  * regionIsoCode
-  * regionName
-  * user
-
-```json
-{
-  "timestamp":"2015-09-12T20:03:45.018Z",
-  "channel":"#en.wikipedia",
-  "namespace":"Main",
-  "page":"Spider-Man's powers and equipment",
-  "user":"foobar",
-  "comment":"/* Artificial web-shooters */",
-  "cityName":"New York",
-  "regionName":"New York",
-  "regionIsoCode":"NY",
-  "countryName":"United States",
-  "countryIsoCode":"US",
-  "isAnonymous":false,
-  "isNew":false,
-  "isMinor":false,
-  "isRobot":false,
-  "isUnpatrolled":false,
-  "added":99,
-  "delta":99,
-  "deleted":0,
-}
-```
+It may take a few seconds for all Druid services to finish starting, including the [Druid router](../design/router.md), which serves the console. If you attempt to open the Druid console before startup is complete, you may see errors in the browser. Wait a few moments and try again. 
 
 
-### Data loading tutorials
+## Step 4. Load data
 
-The following tutorials demonstrate various methods of loading data into Druid, including both batch and streaming use cases.
-All tutorials assume that you are using the `micro-quickstart` single-machine configuration mentioned above.
 
-- [Loading a file](./tutorial-batch.md) - this tutorial demonstrates how to perform a batch file load, using Druid's native batch ingestion.
-- [Loading stream data from Apache Kafka](./tutorial-kafka.md) - this tutorial demonstrates how to load streaming data from a Kafka topic.
-- [Loading a file using Apache Hadoop](./tutorial-batch-hadoop.md) - this tutorial demonstrates how to perform a batch file load, using a remote Hadoop cluster.
-- [Writing your own ingestion spec](./tutorial-ingestion-spec.md) - this tutorial demonstrates how to write a new ingestion spec and use it to load data.
+Ingestion specs define the schema of the data Druid reads and stores. You can write ingestion specs by hand or using the _data loader_, 

Review comment:
       We might want to mention that the tutorial here is to do the batch file load using Druid's native batch ingestion.  The original page separates the data loading into 4 different links "Loading a file", "Loading stream data from Apache Kafka", "Loading a file using Apache Hadoop" and "Writing your own ingestion spec" with explanations after each link.  That makes it clear about what the user is reading when a link is clicked.  But once we move part of content from "Loading a file" here, it becomes less clear that we will be doing batch file load with native batch ingestion here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] weishiuntsai commented on a change in pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
weishiuntsai commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r414963007



##########
File path: docs/tutorials/index.md
##########
@@ -23,71 +23,63 @@ title: "Quickstart"
   -->
 
 
-In this quickstart, we will download Druid and set it up on a single machine. The cluster will be ready to load data
-after completing this initial setup.
+This quickstart gets you started with Apache Druid and introduces you to some of its basic features. 
+Following these steps, you will install Druid and load sample 
+data using its native batch ingestion feature. 
 
-Before beginning the quickstart, it is helpful to read the [general Druid overview](../design/index.md) and the
-[ingestion overview](../ingestion/index.md), as the tutorials will refer to concepts discussed on those pages.
+Before starting, you may want to read the [general Druid overview](../design/index.md) and
+[ingestion overview](../ingestion/index.md), as the tutorials refer to concepts discussed on those pages.
 
-## Prerequisites
+## Requirements
 
-### Software
+You can follow these steps on a relatively small machine, such as a laptop with around 4 CPU and 16 GB of RAM. 
 
-You will need:
+Druid comes with several startup configuration profiles for a range of machine sizes. 
+The `micro-quickstart`configuration profile shown here is suitable for early evaluation scenarios. To explore 
+Druid's performance or scaling capabilities, you'll need a larger machine.
 
-* **Java 8 (8u92+) or later**
-* Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
-
-> **Warning:** Druid only officially supports Java 8. Any Java version later than 8 is still experimental.
->
-> If needed, you can specify where to find Java using the environment variables `DRUID_JAVA_HOME` or `JAVA_HOME`. For more details run the verify-java script.
+The configuration profiles included with Druid range from the even smaller _Nano-Quickstart_ configuration (1 CPU, 4GB RAM) 
+to the _X-large_ configuration (64 CPU, 512GB RAM). For more information, see 
+[Single server deployment](operations/single-server). Alternatively, see [Clustered deployment](tutorials/cluster) for 
+information on deploying Druid services across clustered machines. 
 
-### Hardware
+The software requirements for the installation machine are:
 
-Druid includes several example [single-server configurations](../operations/single-server.md), along with scripts to
-start the Druid processes using these configurations.
+* Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
+* Java 8, Update 92 or later (8u92+)
 
-If you're running on a small machine such as a laptop for a quick evaluation, the `micro-quickstart` configuration is
-a good choice, sized for a 4CPU/16GB RAM environment.
+> Druid officially supports Java 8 only. Support for later major versions of Java is currently in experimental status.
 
-If you plan to use the single-machine deployment for further evaluation beyond the tutorials, we recommend a larger
-configuration than `micro-quickstart`.
+> Druid relies on the environment variables `JAVA_HOME` or `DRUID_JAVA_HOME` to find Java on the machine. You can set 
+`DRUID_JAVA_HOME` if there is more than one instance of Java. To verify Java requirements for your environment, run the 
+`bin/verify-java` script.
 
-## Getting started
 
-[Download](https://www.apache.org/dyn/closer.cgi?path=/druid/{{DRUIDVERSION}}/apache-druid-{{DRUIDVERSION}}-bin.tar.gz)
-the {{DRUIDVERSION}} release.
+## Step 1. Install Druid
 
-Extract Druid by running the following commands in your terminal:
+After confirming the [requirements](#requirements), follow these steps: 
 
-```bash
-tar -xzf apache-druid-{{DRUIDVERSION}}-bin.tar.gz
-cd apache-druid-{{DRUIDVERSION}}
-```
+1. Download
+the [{{DRUIDVERSION}} release](https://www.apache.org/dyn/closer.cgi?path=/druid/{{DRUIDVERSION}}/apache-druid-{{DRUIDVERSION}}-bin.tar.gz).
+2. In your terminal, extract Druid and change directories to the distribution directory:
 
-In the package, you should find:
+   ```bash
+   tar -xzf apache-druid-{{DRUIDVERSION}}-bin.tar.gz
+   cd apache-druid-{{DRUIDVERSION}}
+   ```
+In the directory, you'll find `LICENSE` and `NOTICE` files and subdirectories for executable files, configuration files, sample data and more.
 
-* `LICENSE` and `NOTICE` files
-* `bin/*` - scripts useful for this quickstart
-* `conf/*` - example configurations for single-server and clustered setup
-* `extensions/*` - core Druid extensions
-* `hadoop-dependencies/*` - Druid Hadoop dependencies
-* `lib/*` - libraries and dependencies for core Druid

Review comment:
       In my personal opinion, I find this part useful when I was reading the original version.  It gives the reader a sense of what's included in the package.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sthetland commented on a change in pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
sthetland commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r416053995



##########
File path: docs/tutorials/index.md
##########
@@ -99,96 +91,173 @@ $ ./bin/start-micro-quickstart
 [Fri May  3 11:40:50 2019] Running command[middleManager], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/middleManager.log]: bin/run-druid middleManager conf/druid/single-server/micro-quickstart
 ```
 
-All persistent state such as the cluster metadata store and segments for the services will be kept in the `var` directory under the apache-druid-{{DRUIDVERSION}} package root. Logs for the services are located at `var/sv`.
+All persistent state, such as the cluster metadata store and segments for the services, are kept in the `var` directory under 
+the Druid root directory, apache-druid-{{DRUIDVERSION}}. Each service writes to a log file under `var/sv`, as noted in the startup script output above.
+
+At any time, you can revert Druid to its original, post-installation state by deleting the entire `var` directory. You may
+want to do this, for example, between Druid tutorials or after experimentation, to start with a fresh instance. 
+
+To stop Druid at any time, use CTRL-C in the terminal. This exits the `bin/start-micro-quickstart` script and 
+terminates all Druid processes. 
+
 
-Later on, if you'd like to stop the services, CTRL-C to exit the `bin/start-micro-quickstart` script, which will terminate the Druid processes.
+## Step 3. Open the Druid console 
 
-Once the cluster has started, you can navigate to [http://localhost:8888](http://localhost:8888).
-The [Druid router process](../design/router.md), which serves the [Druid console](../operations/druid-console.md), resides at this address.
+After the Druid services finish startup, open the [Druid console](../operations/druid-console.md) at [http://localhost:8888](http://localhost:8888). 
 
 ![Druid console](../assets/tutorial-quickstart-01.png "Druid console")
 
-It takes a few seconds for all the Druid processes to fully start up. If you open the console immediately after starting the services, you may see some errors that you can safely ignore.
-
-
-## Loading data
-
-### Tutorial dataset
-
-For the following data loading tutorials, we have included a sample data file containing Wikipedia page edit events that occurred on 2015-09-12.
-
-This sample data is located at `quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid package root.
-The page edit events are stored as JSON objects in a text file.
-
-The sample data has the following columns, and an example event is shown below:
-
-  * added
-  * channel
-  * cityName
-  * comment
-  * countryIsoCode
-  * countryName
-  * deleted
-  * delta
-  * isAnonymous
-  * isMinor
-  * isNew
-  * isRobot
-  * isUnpatrolled
-  * metroCode
-  * namespace
-  * page
-  * regionIsoCode
-  * regionName
-  * user
-
-```json
-{
-  "timestamp":"2015-09-12T20:03:45.018Z",
-  "channel":"#en.wikipedia",
-  "namespace":"Main",
-  "page":"Spider-Man's powers and equipment",
-  "user":"foobar",
-  "comment":"/* Artificial web-shooters */",
-  "cityName":"New York",
-  "regionName":"New York",
-  "regionIsoCode":"NY",
-  "countryName":"United States",
-  "countryIsoCode":"US",
-  "isAnonymous":false,
-  "isNew":false,
-  "isMinor":false,
-  "isRobot":false,
-  "isUnpatrolled":false,
-  "added":99,
-  "delta":99,
-  "deleted":0,
-}
-```
+It may take a few seconds for all Druid services to finish starting, including the [Druid router](../design/router.md), which serves the console. If you attempt to open the Druid console before startup is complete, you may see errors in the browser. Wait a few moments and try again. 
 
 
-### Data loading tutorials
+## Step 4. Load data
 
-The following tutorials demonstrate various methods of loading data into Druid, including both batch and streaming use cases.
-All tutorials assume that you are using the `micro-quickstart` single-machine configuration mentioned above.
 
-- [Loading a file](./tutorial-batch.md) - this tutorial demonstrates how to perform a batch file load, using Druid's native batch ingestion.
-- [Loading stream data from Apache Kafka](./tutorial-kafka.md) - this tutorial demonstrates how to load streaming data from a Kafka topic.
-- [Loading a file using Apache Hadoop](./tutorial-batch-hadoop.md) - this tutorial demonstrates how to perform a batch file load, using a remote Hadoop cluster.
-- [Writing your own ingestion spec](./tutorial-ingestion-spec.md) - this tutorial demonstrates how to write a new ingestion spec and use it to load data.
+Ingestion specs define the schema of the data Druid reads and stores. You can write ingestion specs by hand or using the _data loader_, 
+as we will do here. 
 
-### Resetting cluster state
+For this tutorial, we'll load sample data bundled with Druid that represents Wikipedia page edits on a given day. 
 
-If you want a clean start after stopping the services, delete the `var` directory and run the `bin/start-micro-quickstart` script again.
+1. Click **Load data** from the Druid console header (![Load data](../assets/tutorial-batch-data-loader-00.png)).
 
-Once every service has started, you are now ready to load data.
+2. Select the **Local disk** tile and then click **Connect data**.
 
-#### Resetting Kafka
+   ![Data loader init](../assets/tutorial-batch-data-loader-01.png "Data loader init")
+
+3. Enter the following values: 
+
+   - **Base directory**: `quickstart/tutorial/`
+
+   - **File filter**: `wikiticker-2015-09-12-sampled.json.gz` 
+
+   ![Data location](../assets/tutorial-batch-data-loader-015.png "Data location")
+
+   Entering the base directory and [wildcard file filter](https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html) separately, as afforded by the UI, allows you to specify multiple files for ingestion at once.
+
+4. Click **Apply**. 
+
+   The data loader displays the raw data, giving you a chance to verify that the data 
+   appears as expected. 
+
+   ![Data loader sample](../assets/tutorial-batch-data-loader-02.png "Data loader sample")
+
+   Notice that your position in the sequence of steps to load data, **Connect** in our case, appears at the top of the console, as shown below. 
+   You can click other steps to move forward or backward in the sequence at any time.
+   
+   ![Load data](../assets/tutorial-batch-data-loader-12.png)  
+   
+
+5. Click **Next: Parse data**. 
+
+   The data loader tries to determine the parser appropriate for the data format automatically. In this case 
+   it identifies the data format as `json`, as shown in the **Input format** field at the bottom right.
+
+   ![Data loader parse data](../assets/tutorial-batch-data-loader-03.png "Data loader parse data")
+
+   Feel free to select other **Input format** options to get a sense of their configuration settings 
+   and how Druid parses other types of data.  
+
+6. With the JSON parser selected, click **Next: Parse time**. The **Parse time** settings are where you view and adjust the 
+   primary timestamp column for the data.
+
+   ![Data loader parse time](../assets/tutorial-batch-data-loader-04.png "Data loader parse time")
+
+   Druid requires data to have a primary timestamp column (internally stored in a column called `__time`).
+   If you do not have a timestamp in your data, select `Constant value`. In our example, the data loader 
+   determines that the `time` column is the only candidate that can be used as the primary time column.
+
+7. Click **Next: Transform**, **Next: Filter**, and then **Next: Configure schema**, skipping a few steps.
+
+   You do not need to adjust transformation or filtering settings, as applying ingestion time transforms and 
+   filters are out of scope for this tutorial.
+
+8. The Configure schema settings are where you configure what [dimensions](../ingestion/index.md#dimensions) 
+   and [metrics](../ingestion/index.md#metrics) are ingested. The outcome of this configuration represents exactly how the 
+   data will appear in Druid after ingestion. 
+
+   Since our dataset is very small, you can turn off [rollup](../ingestion/index.md#rollup) 
+   by unsetting the **Rollup** switch and confirming the change when prompted.
+
+   ![Data loader schema](../assets/tutorial-batch-data-loader-05.png "Data loader schema")
+
+
+10. Click **Next: Partition** to configure how the data will be split into segments. In this case, choose `DAY` as 
+    the **Segment Granularity**. 
+
+    ![Data loader partition](../assets/tutorial-batch-data-loader-06.png "Data loader partition")
+
+    Since this is a small dataset, we can have just a single segment, which is what selecting `DAY` as the 
+    segment granularity gives us. 
+
+11. Click **Next: Tune** and **Next: Publish**.
+
+12. The Publish settings are where you can specify the datasource name in Druid. Change the default from `wikiticker-2015-09-12-sampled` 

Review comment:
       Done. Alternatively, I wondered if we should call it something unique, like wikipedia-batchfile, or something, to allow the tutorial datasources to live together without name collisions. Perhaps for later...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] weishiuntsai commented on a change in pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
weishiuntsai commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r414961671



##########
File path: docs/tutorials/index.md
##########
@@ -23,71 +23,63 @@ title: "Quickstart"
   -->
 
 
-In this quickstart, we will download Druid and set it up on a single machine. The cluster will be ready to load data
-after completing this initial setup.
+This quickstart gets you started with Apache Druid and introduces you to some of its basic features. 
+Following these steps, you will install Druid and load sample 
+data using its native batch ingestion feature. 
 
-Before beginning the quickstart, it is helpful to read the [general Druid overview](../design/index.md) and the
-[ingestion overview](../ingestion/index.md), as the tutorials will refer to concepts discussed on those pages.
+Before starting, you may want to read the [general Druid overview](../design/index.md) and
+[ingestion overview](../ingestion/index.md), as the tutorials refer to concepts discussed on those pages.
 
-## Prerequisites
+## Requirements
 
-### Software
+You can follow these steps on a relatively small machine, such as a laptop with around 4 CPU and 16 GB of RAM. 
 
-You will need:
+Druid comes with several startup configuration profiles for a range of machine sizes. 
+The `micro-quickstart`configuration profile shown here is suitable for early evaluation scenarios. To explore 
+Druid's performance or scaling capabilities, you'll need a larger machine.

Review comment:
       Maybe "you'll need a larger machine and configuration profile"?  This paragraph talks about configuration profiles, and the original description mentioned "we recommend a larger configuration than micro-quickstart".  The focus on the configuration profile seems to have lost here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sthetland commented on a change in pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
sthetland commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r416059283



##########
File path: docs/tutorials/index.md
##########
@@ -23,71 +23,63 @@ title: "Quickstart"
   -->
 
 
-In this quickstart, we will download Druid and set it up on a single machine. The cluster will be ready to load data
-after completing this initial setup.
+This quickstart gets you started with Apache Druid and introduces you to some of its basic features. 
+Following these steps, you will install Druid and load sample 
+data using its native batch ingestion feature. 
 
-Before beginning the quickstart, it is helpful to read the [general Druid overview](../design/index.md) and the
-[ingestion overview](../ingestion/index.md), as the tutorials will refer to concepts discussed on those pages.
+Before starting, you may want to read the [general Druid overview](../design/index.md) and
+[ingestion overview](../ingestion/index.md), as the tutorials refer to concepts discussed on those pages.
 
-## Prerequisites
+## Requirements
 
-### Software
+You can follow these steps on a relatively small machine, such as a laptop with around 4 CPU and 16 GB of RAM. 
 
-You will need:
+Druid comes with several startup configuration profiles for a range of machine sizes. 
+The `micro-quickstart`configuration profile shown here is suitable for early evaluation scenarios. To explore 
+Druid's performance or scaling capabilities, you'll need a larger machine.
 
-* **Java 8 (8u92+) or later**
-* Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
-
-> **Warning:** Druid only officially supports Java 8. Any Java version later than 8 is still experimental.
->
-> If needed, you can specify where to find Java using the environment variables `DRUID_JAVA_HOME` or `JAVA_HOME`. For more details run the verify-java script.
+The configuration profiles included with Druid range from the even smaller _Nano-Quickstart_ configuration (1 CPU, 4GB RAM) 
+to the _X-large_ configuration (64 CPU, 512GB RAM). For more information, see 
+[Single server deployment](operations/single-server). Alternatively, see [Clustered deployment](tutorials/cluster) for 
+information on deploying Druid services across clustered machines. 
 
-### Hardware
+The software requirements for the installation machine are:
 
-Druid includes several example [single-server configurations](../operations/single-server.md), along with scripts to
-start the Druid processes using these configurations.
+* Linux, Mac OS X, or other Unix-like OS (Windows is not supported)
+* Java 8, Update 92 or later (8u92+)
 
-If you're running on a small machine such as a laptop for a quick evaluation, the `micro-quickstart` configuration is
-a good choice, sized for a 4CPU/16GB RAM environment.
+> Druid officially supports Java 8 only. Support for later major versions of Java is currently in experimental status.
 
-If you plan to use the single-machine deployment for further evaluation beyond the tutorials, we recommend a larger
-configuration than `micro-quickstart`.
+> Druid relies on the environment variables `JAVA_HOME` or `DRUID_JAVA_HOME` to find Java on the machine. You can set 
+`DRUID_JAVA_HOME` if there is more than one instance of Java. To verify Java requirements for your environment, run the 
+`bin/verify-java` script.
 
-## Getting started
 
-[Download](https://www.apache.org/dyn/closer.cgi?path=/druid/{{DRUIDVERSION}}/apache-druid-{{DRUIDVERSION}}-bin.tar.gz)
-the {{DRUIDVERSION}} release.
+## Step 1. Install Druid
 
-Extract Druid by running the following commands in your terminal:
+After confirming the [requirements](#requirements), follow these steps: 
 
-```bash
-tar -xzf apache-druid-{{DRUIDVERSION}}-bin.tar.gz
-cd apache-druid-{{DRUIDVERSION}}
-```
+1. Download
+the [{{DRUIDVERSION}} release](https://www.apache.org/dyn/closer.cgi?path=/druid/{{DRUIDVERSION}}/apache-druid-{{DRUIDVERSION}}-bin.tar.gz).
+2. In your terminal, extract Druid and change directories to the distribution directory:
 
-In the package, you should find:
+   ```bash
+   tar -xzf apache-druid-{{DRUIDVERSION}}-bin.tar.gz
+   cd apache-druid-{{DRUIDVERSION}}
+   ```
+In the directory, you'll find `LICENSE` and `NOTICE` files and subdirectories for executable files, configuration files, sample data and more.
 
-* `LICENSE` and `NOTICE` files
-* `bin/*` - scripts useful for this quickstart
-* `conf/*` - example configurations for single-server and clustered setup
-* `extensions/*` - core Druid extensions
-* `hadoop-dependencies/*` - Druid Hadoop dependencies
-* `lib/*` - libraries and dependencies for core Druid

Review comment:
       Thanks, I'll keep a note of this and see how this change goes over... I removed the list based on previous feedback (plus a directory and file were missing, so the content was out of date). 
   
   I'm open to restoring though, especially if installation isn't covered elsewhere..I'd say it's a lot of detail for a quickstart, especially one that's gotten much longer, but the right level for an installation guide. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] weishiuntsai commented on a change in pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
weishiuntsai commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r414964427



##########
File path: docs/tutorials/index.md
##########
@@ -99,96 +91,173 @@ $ ./bin/start-micro-quickstart
 [Fri May  3 11:40:50 2019] Running command[middleManager], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/middleManager.log]: bin/run-druid middleManager conf/druid/single-server/micro-quickstart
 ```
 
-All persistent state such as the cluster metadata store and segments for the services will be kept in the `var` directory under the apache-druid-{{DRUIDVERSION}} package root. Logs for the services are located at `var/sv`.
+All persistent state, such as the cluster metadata store and segments for the services, are kept in the `var` directory under 
+the Druid root directory, apache-druid-{{DRUIDVERSION}}. Each service writes to a log file under `var/sv`, as noted in the startup script output above.
+
+At any time, you can revert Druid to its original, post-installation state by deleting the entire `var` directory. You may
+want to do this, for example, between Druid tutorials or after experimentation, to start with a fresh instance. 
+
+To stop Druid at any time, use CTRL-C in the terminal. This exits the `bin/start-micro-quickstart` script and 
+terminates all Druid processes. 
+
 
-Later on, if you'd like to stop the services, CTRL-C to exit the `bin/start-micro-quickstart` script, which will terminate the Druid processes.
+## Step 3. Open the Druid console 
 
-Once the cluster has started, you can navigate to [http://localhost:8888](http://localhost:8888).
-The [Druid router process](../design/router.md), which serves the [Druid console](../operations/druid-console.md), resides at this address.
+After the Druid services finish startup, open the [Druid console](../operations/druid-console.md) at [http://localhost:8888](http://localhost:8888). 
 
 ![Druid console](../assets/tutorial-quickstart-01.png "Druid console")
 
-It takes a few seconds for all the Druid processes to fully start up. If you open the console immediately after starting the services, you may see some errors that you can safely ignore.
-
-
-## Loading data
-
-### Tutorial dataset
-
-For the following data loading tutorials, we have included a sample data file containing Wikipedia page edit events that occurred on 2015-09-12.
-
-This sample data is located at `quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid package root.
-The page edit events are stored as JSON objects in a text file.
-
-The sample data has the following columns, and an example event is shown below:
-
-  * added
-  * channel
-  * cityName
-  * comment
-  * countryIsoCode
-  * countryName
-  * deleted
-  * delta
-  * isAnonymous
-  * isMinor
-  * isNew
-  * isRobot
-  * isUnpatrolled
-  * metroCode
-  * namespace
-  * page
-  * regionIsoCode
-  * regionName
-  * user
-
-```json
-{
-  "timestamp":"2015-09-12T20:03:45.018Z",
-  "channel":"#en.wikipedia",
-  "namespace":"Main",
-  "page":"Spider-Man's powers and equipment",
-  "user":"foobar",
-  "comment":"/* Artificial web-shooters */",
-  "cityName":"New York",
-  "regionName":"New York",
-  "regionIsoCode":"NY",
-  "countryName":"United States",
-  "countryIsoCode":"US",
-  "isAnonymous":false,
-  "isNew":false,
-  "isMinor":false,
-  "isRobot":false,
-  "isUnpatrolled":false,
-  "added":99,
-  "delta":99,
-  "deleted":0,
-}
-```
+It may take a few seconds for all Druid services to finish starting, including the [Druid router](../design/router.md), which serves the console. If you attempt to open the Druid console before startup is complete, you may see errors in the browser. Wait a few moments and try again. 
 
 
-### Data loading tutorials
+## Step 4. Load data
 
-The following tutorials demonstrate various methods of loading data into Druid, including both batch and streaming use cases.
-All tutorials assume that you are using the `micro-quickstart` single-machine configuration mentioned above.
 
-- [Loading a file](./tutorial-batch.md) - this tutorial demonstrates how to perform a batch file load, using Druid's native batch ingestion.
-- [Loading stream data from Apache Kafka](./tutorial-kafka.md) - this tutorial demonstrates how to load streaming data from a Kafka topic.
-- [Loading a file using Apache Hadoop](./tutorial-batch-hadoop.md) - this tutorial demonstrates how to perform a batch file load, using a remote Hadoop cluster.
-- [Writing your own ingestion spec](./tutorial-ingestion-spec.md) - this tutorial demonstrates how to write a new ingestion spec and use it to load data.
+Ingestion specs define the schema of the data Druid reads and stores. You can write ingestion specs by hand or using the _data loader_, 
+as we will do here. 
 
-### Resetting cluster state
+For this tutorial, we'll load sample data bundled with Druid that represents Wikipedia page edits on a given day. 

Review comment:
       I think it's worthwhile to mention the data file that we will be loading here.  The original version has this part "This sample data is located at quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz from the Druid package root.".  That makes it a bit more clear before diving into directory and filer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] sthetland commented on a change in pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
sthetland commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r416038824



##########
File path: docs/tutorials/index.md
##########
@@ -23,71 +23,63 @@ title: "Quickstart"
   -->
 
 
-In this quickstart, we will download Druid and set it up on a single machine. The cluster will be ready to load data
-after completing this initial setup.
+This quickstart gets you started with Apache Druid and introduces you to some of its basic features. 
+Following these steps, you will install Druid and load sample 
+data using its native batch ingestion feature. 
 
-Before beginning the quickstart, it is helpful to read the [general Druid overview](../design/index.md) and the
-[ingestion overview](../ingestion/index.md), as the tutorials will refer to concepts discussed on those pages.
+Before starting, you may want to read the [general Druid overview](../design/index.md) and
+[ingestion overview](../ingestion/index.md), as the tutorials refer to concepts discussed on those pages.
 
-## Prerequisites
+## Requirements
 
-### Software
+You can follow these steps on a relatively small machine, such as a laptop with around 4 CPU and 16 GB of RAM. 
 
-You will need:
+Druid comes with several startup configuration profiles for a range of machine sizes. 
+The `micro-quickstart`configuration profile shown here is suitable for early evaluation scenarios. To explore 
+Druid's performance or scaling capabilities, you'll need a larger machine.

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] 2bethere commented on pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
2bethere commented on pull request #9766:
URL: https://github.com/apache/druid/pull/9766#issuecomment-620434325


   No comment, LGTM.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
jon-wei commented on pull request #9766:
URL: https://github.com/apache/druid/pull/9766#issuecomment-621373691


   @shetland There are some conflicts on:
   
   ```
   docs/assets/tutorial-batch-data-loader-02.png
   docs/assets/tutorial-batch-data-loader-03.png
   docs/assets/tutorial-batch-data-loader-04.png
   docs/tutorials/tutorial-batch.md
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] weishiuntsai commented on a change in pull request #9766: Druid Quickstart refactor and update

Posted by GitBox <gi...@apache.org>.
weishiuntsai commented on a change in pull request #9766:
URL: https://github.com/apache/druid/pull/9766#discussion_r414967668



##########
File path: docs/tutorials/index.md
##########
@@ -99,96 +91,173 @@ $ ./bin/start-micro-quickstart
 [Fri May  3 11:40:50 2019] Running command[middleManager], logging to[/apache-druid-{{DRUIDVERSION}}/var/sv/middleManager.log]: bin/run-druid middleManager conf/druid/single-server/micro-quickstart
 ```
 
-All persistent state such as the cluster metadata store and segments for the services will be kept in the `var` directory under the apache-druid-{{DRUIDVERSION}} package root. Logs for the services are located at `var/sv`.
+All persistent state, such as the cluster metadata store and segments for the services, are kept in the `var` directory under 
+the Druid root directory, apache-druid-{{DRUIDVERSION}}. Each service writes to a log file under `var/sv`, as noted in the startup script output above.
+
+At any time, you can revert Druid to its original, post-installation state by deleting the entire `var` directory. You may
+want to do this, for example, between Druid tutorials or after experimentation, to start with a fresh instance. 
+
+To stop Druid at any time, use CTRL-C in the terminal. This exits the `bin/start-micro-quickstart` script and 
+terminates all Druid processes. 
+
 
-Later on, if you'd like to stop the services, CTRL-C to exit the `bin/start-micro-quickstart` script, which will terminate the Druid processes.
+## Step 3. Open the Druid console 
 
-Once the cluster has started, you can navigate to [http://localhost:8888](http://localhost:8888).
-The [Druid router process](../design/router.md), which serves the [Druid console](../operations/druid-console.md), resides at this address.
+After the Druid services finish startup, open the [Druid console](../operations/druid-console.md) at [http://localhost:8888](http://localhost:8888). 
 
 ![Druid console](../assets/tutorial-quickstart-01.png "Druid console")
 
-It takes a few seconds for all the Druid processes to fully start up. If you open the console immediately after starting the services, you may see some errors that you can safely ignore.
-
-
-## Loading data
-
-### Tutorial dataset
-
-For the following data loading tutorials, we have included a sample data file containing Wikipedia page edit events that occurred on 2015-09-12.
-
-This sample data is located at `quickstart/tutorial/wikiticker-2015-09-12-sampled.json.gz` from the Druid package root.
-The page edit events are stored as JSON objects in a text file.
-
-The sample data has the following columns, and an example event is shown below:
-
-  * added
-  * channel
-  * cityName
-  * comment
-  * countryIsoCode
-  * countryName
-  * deleted
-  * delta
-  * isAnonymous
-  * isMinor
-  * isNew
-  * isRobot
-  * isUnpatrolled
-  * metroCode
-  * namespace
-  * page
-  * regionIsoCode
-  * regionName
-  * user
-
-```json
-{
-  "timestamp":"2015-09-12T20:03:45.018Z",
-  "channel":"#en.wikipedia",
-  "namespace":"Main",
-  "page":"Spider-Man's powers and equipment",
-  "user":"foobar",
-  "comment":"/* Artificial web-shooters */",
-  "cityName":"New York",
-  "regionName":"New York",
-  "regionIsoCode":"NY",
-  "countryName":"United States",
-  "countryIsoCode":"US",
-  "isAnonymous":false,
-  "isNew":false,
-  "isMinor":false,
-  "isRobot":false,
-  "isUnpatrolled":false,
-  "added":99,
-  "delta":99,
-  "deleted":0,
-}
-```
+It may take a few seconds for all Druid services to finish starting, including the [Druid router](../design/router.md), which serves the console. If you attempt to open the Druid console before startup is complete, you may see errors in the browser. Wait a few moments and try again. 
 
 
-### Data loading tutorials
+## Step 4. Load data
 
-The following tutorials demonstrate various methods of loading data into Druid, including both batch and streaming use cases.
-All tutorials assume that you are using the `micro-quickstart` single-machine configuration mentioned above.
 
-- [Loading a file](./tutorial-batch.md) - this tutorial demonstrates how to perform a batch file load, using Druid's native batch ingestion.
-- [Loading stream data from Apache Kafka](./tutorial-kafka.md) - this tutorial demonstrates how to load streaming data from a Kafka topic.
-- [Loading a file using Apache Hadoop](./tutorial-batch-hadoop.md) - this tutorial demonstrates how to perform a batch file load, using a remote Hadoop cluster.
-- [Writing your own ingestion spec](./tutorial-ingestion-spec.md) - this tutorial demonstrates how to write a new ingestion spec and use it to load data.
+Ingestion specs define the schema of the data Druid reads and stores. You can write ingestion specs by hand or using the _data loader_, 
+as we will do here. 
 
-### Resetting cluster state
+For this tutorial, we'll load sample data bundled with Druid that represents Wikipedia page edits on a given day. 
 
-If you want a clean start after stopping the services, delete the `var` directory and run the `bin/start-micro-quickstart` script again.
+1. Click **Load data** from the Druid console header (![Load data](../assets/tutorial-batch-data-loader-00.png)).
 
-Once every service has started, you are now ready to load data.
+2. Select the **Local disk** tile and then click **Connect data**.
 
-#### Resetting Kafka
+   ![Data loader init](../assets/tutorial-batch-data-loader-01.png "Data loader init")
+
+3. Enter the following values: 
+
+   - **Base directory**: `quickstart/tutorial/`
+
+   - **File filter**: `wikiticker-2015-09-12-sampled.json.gz` 
+
+   ![Data location](../assets/tutorial-batch-data-loader-015.png "Data location")
+
+   Entering the base directory and [wildcard file filter](https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/filefilter/WildcardFileFilter.html) separately, as afforded by the UI, allows you to specify multiple files for ingestion at once.
+
+4. Click **Apply**. 
+
+   The data loader displays the raw data, giving you a chance to verify that the data 
+   appears as expected. 
+
+   ![Data loader sample](../assets/tutorial-batch-data-loader-02.png "Data loader sample")
+
+   Notice that your position in the sequence of steps to load data, **Connect** in our case, appears at the top of the console, as shown below. 
+   You can click other steps to move forward or backward in the sequence at any time.
+   
+   ![Load data](../assets/tutorial-batch-data-loader-12.png)  
+   
+
+5. Click **Next: Parse data**. 
+
+   The data loader tries to determine the parser appropriate for the data format automatically. In this case 
+   it identifies the data format as `json`, as shown in the **Input format** field at the bottom right.
+
+   ![Data loader parse data](../assets/tutorial-batch-data-loader-03.png "Data loader parse data")
+
+   Feel free to select other **Input format** options to get a sense of their configuration settings 
+   and how Druid parses other types of data.  
+
+6. With the JSON parser selected, click **Next: Parse time**. The **Parse time** settings are where you view and adjust the 
+   primary timestamp column for the data.
+
+   ![Data loader parse time](../assets/tutorial-batch-data-loader-04.png "Data loader parse time")
+
+   Druid requires data to have a primary timestamp column (internally stored in a column called `__time`).
+   If you do not have a timestamp in your data, select `Constant value`. In our example, the data loader 
+   determines that the `time` column is the only candidate that can be used as the primary time column.
+
+7. Click **Next: Transform**, **Next: Filter**, and then **Next: Configure schema**, skipping a few steps.
+
+   You do not need to adjust transformation or filtering settings, as applying ingestion time transforms and 
+   filters are out of scope for this tutorial.
+
+8. The Configure schema settings are where you configure what [dimensions](../ingestion/index.md#dimensions) 
+   and [metrics](../ingestion/index.md#metrics) are ingested. The outcome of this configuration represents exactly how the 
+   data will appear in Druid after ingestion. 
+
+   Since our dataset is very small, you can turn off [rollup](../ingestion/index.md#rollup) 
+   by unsetting the **Rollup** switch and confirming the change when prompted.
+
+   ![Data loader schema](../assets/tutorial-batch-data-loader-05.png "Data loader schema")
+
+
+10. Click **Next: Partition** to configure how the data will be split into segments. In this case, choose `DAY` as 
+    the **Segment Granularity**. 
+
+    ![Data loader partition](../assets/tutorial-batch-data-loader-06.png "Data loader partition")
+
+    Since this is a small dataset, we can have just a single segment, which is what selecting `DAY` as the 
+    segment granularity gives us. 
+
+11. Click **Next: Tune** and **Next: Publish**.
+
+12. The Publish settings are where you can specify the datasource name in Druid. Change the default from `wikiticker-2015-09-12-sampled` 

Review comment:
       Perhaps incorporate the original sentence "Let's name this datasource wikipedia" here with "and change the default from `wikiticker-2015-09-12-sampled`" ?  The original sentence makes it more clear that the user has a choice here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org