You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2019/06/13 07:57:41 UTC

[GitHub] [incubator-druid] vogievetsky commented on a change in pull request #7863: Added the web console to the quickstart tutorials and docs

vogievetsky commented on a change in pull request #7863: Added the web console to the quickstart tutorials and docs
URL: https://github.com/apache/incubator-druid/pull/7863#discussion_r293249124

##########
File path: docs/content/tutorials/tutorial-batch.md
##########
@@ -24,18 +24,98 @@ title: "Tutorial: Loading a file"

# Tutorial: Loading a file

-## Getting started
-
This tutorial demonstrates how to perform a batch file load, using Apache Druid (incubating)'s native batch ingestion.

For this tutorial, we'll assume you've already downloaded Druid as described in
the [quickstart](index.html) using the `micro-quickstart` single-machine configuration and have it
running on your local machine. You don't need to have loaded any data yet.

-## Preparing the data and the ingestion task spec
-
A data load is initiated by submitting an *ingestion task* spec to the Druid Overlord. For this tutorial, we'll be loading the sample Wikipedia page edits data.

+An ingestion spec can be written by hand or you can use the "Data loader" that it built into the Druid console to help iteratively build one for you by sampling your data.
+The data loader currently only supports native batch ingestion (streaming support coming soon), so we can use it for this tutorial.
+
+We've included a sample of Wikipedia edits from September 12, 2015 to get you started.
+
+
+## Loading data with the data loader
+
+Navigate to [localhost:8888](http://localhost:8888) and click `Load data` in the console header, select `Local disk`.
+
+![Data loader init](../tutorials/img/tutorial-batch-data-loader-01.png "Data loader init")
+
+Enter the value of `quickstart/tutorial/` as the base directory and `wikiticker-2015-09-12-sampled.json.gz` as a filter.
+The separation of base directory and filter is there if you need to ingest data from multiple files.
+
+Click `Preview` and make sure that the the data you are seeing is correct.
+
+![Data loader sample](../tutorials/img/tutorial-batch-data-loader-02.png "Data loader sample")
+
+Once the data is located, you can click "Next: Parse data" to go to the next step.
+The data loader will try to automatically determine the correct parser for the data.
+In this case it will successfully determine `json`.
+Feel free to play around with different parser options to get a preview of how Druid will parse your data.
+
+![Data loader parse data](../tutorials/img/tutorial-batch-data-loader-03.png "Data loader parse data")
+
+With the `json` parser selected click `Next: Parse time` to get to the step centered around determining your primary timestamp column.
+Druid's architecture mandates a primary timestamp column that will be called `__time`, which could always just be a `Constant value`.
+In this case the data loader will guess the `time` column as the primary time column as it is the only one with values that look like they might be time.
+
+![Data loader parse time](../tutorials/img/tutorial-batch-data-loader-04.png "Data loader parse time")
+
+Click `Next: ...` twice to go past the `Transform` and `Filter` steps, you do not need to enter anything there and applying ingestion times transforms and filters is out of scope of this tutorial.
+
+In the schema stage, you can configure which dimensions (and metrics) will be ingested into Druid.
+This is exactly what the data will appear like in Druid once it is ingested.
+Since our dataset is very small go ahead and turn off `Rollup` by clicking on the switch and confirming the change.
+
+![Data loader schema](../tutorials/img/tutorial-batch-data-loader-05.png "Data loader schema")
+
+Once you are satisfied with the schema click `Next` to go to the `Partition` stage.
+Here you can adjust how the data will be split up into segments in Druid.
+Since this is such a small dataset there are no adjustments that need to be made in this step.
+
+![Data loader partition](../tutorials/img/tutorial-batch-data-loader-06.png "Data loader partition")
+
+Clicking past the `Tune` stage we get to the publish stage which is where we can specify what the data source will be called in Druid.
+Let's name this data source `wikipedia`.
+
+![Data loader publish](../tutorials/img/tutorial-batch-data-loader-07.png "Data loader publish")
+
+Finally click `Next` to review your spec.
+This is the spec you have constructed.
+Feel free to go back to step and see how making changes there will manifest itself in the spec.
+Similarly you can also edit the spec directly and see it reflected in the other stages.
+
+![Data loader spec](../tutorials/img/tutorial-batch-data-loader-08.png "Data loader spec")
+
+Once you are satisfied with the spec, click `Submit` and an ingestion task will be created.
+
+You will be taken to the task view with the focus on the newly created task.
+
+![Tasks view](../tutorials/img/tutorial-batch-data-loader-09.png "Tasks view")
+
+In the tasks view you can click `Refresh` and couple of times until your ingestion task (hopefully) succeeds.
+
+When a tasks succeeds it means that it built one or more segments that will now be picked up by the data servers.
+
+Navigate to the `Datasource` view and click refresh until your datasource (`wikipedia`) appears.
+This could take a few seconds as the segments are being loaded.
+
+![Datasource view](../tutorials/img/tutorial-batch-data-loader-10.png "Datasource view")
+
+Once you see the data source there with a green (fully available) circle, you can go to the `Query` view to run SQL queries against this datasource.

Review comment:
I think it is one: https://druid.apache.org/docs/latest/design/index.html#datasources-and-segments (will fix)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org