You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kudu.apache.org by gr...@apache.org on 2020/03/24 15:11:15 UTC

[kudu] 02/02: [quickstart] Add an Apache Impala quickstart guide

This is an automated email from the ASF dual-hosted git repository.

granthenke pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit e4596ae406c2d0fb621d0e4b8e002c4c600ef291
Author: Grant Henke <gr...@apache.org>
AuthorDate: Tue Mar 10 17:02:51 2020 -0500

    [quickstart] Add an Apache Impala quickstart guide
    
    This patch adds a short guide to illustrate basic Impala features
    when using Kudu tables. Follow on guides can provide more detailed
    and complicated examples.
    
    To enable others to test this guide I pushed an image to my personal
    dockerhub repo. Use `granthenke/kudu:impala-latest` in place of
     `apache/kudu:impala-latest` when walking through the guide.
    
    I also added help sections to the existing guides and a resource
    configuration section to the top level guide based on feedback
    from users walking through this example.
    
    Change-Id: I208623add96f47119d36ac975b04352854398983
    Reviewed-on: http://gerrit.cloudera.org:8080/15397
    Tested-by: Kudu Jenkins
    Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
 docs/quickstart.adoc                   |  19 ++-
 examples/quickstart/impala/README.adoc | 221 +++++++++++++++++++++++++++++++++
 examples/quickstart/nifi/README.adoc   |  11 +-
 examples/quickstart/spark/README.adoc  |   5 +
 examples/quickstart/ycsb/README.adoc   |   6 +
 5 files changed, 257 insertions(+), 5 deletions(-)

diff --git a/docs/quickstart.adoc b/docs/quickstart.adoc
index 46e06a9..39a3286 100644
--- a/docs/quickstart.adoc
+++ b/docs/quickstart.adoc
@@ -40,6 +40,18 @@ be used for production or performance/scale testing.
 Follow the Docker link:https://docs.docker.com/install/[install documentation]
 to install docker in your Linux, Mac, or Windows environment.
 
+Configure the Docker install to have enough resources to run the quickstart guides.
+
+- link:https://docs.docker.com/docker-for-mac/#resources[Docker for Mac Resource Configuration Guide]
+
+A minimum configuration that can run all the quickstart examples comfortably is:
+
+- 4 CPUs
+- 6 GB Memory
+- 50 GB Disk
+
+NOTE: You can likely get by with a lower resource configuration, but you may lose some performance and stability.
+
 You may also want to read through the Docker getting started guide, but that isn't a requirement.
 
 == Clone the Repository
@@ -72,10 +84,10 @@ you can specify the master addresses with `localhost:7051,localhost:7151,localho
 
 [source,bash]
 ----
-docker-compose -f docker/quickstart.yml up
+docker-compose -f docker/quickstart.yml up -d
 ----
 
-NOTE: You can include the `-d` flag to run the cluster in the background.
+NOTE: You can remove the `-d` flag to run the cluster in the foreground.
 
 === View the Web-UI
 
@@ -129,6 +141,9 @@ More complete walkthroughs using the quickstart Kudu cluster can be found in the
 `examples/quickstart` directory. For convenience you can browse them on
 link:https://github.com/apache/kudu/tree/master/examples/quickstart[Github].
 
+- link:https://github.com/apache/kudu/tree/master/examples/quickstart/nifi[NiFi Quickstart Guide]
+- link:https://github.com/apache/kudu/tree/master/examples/quickstart/spark[Spark Quickstart Guide]
+- link:https://github.com/apache/kudu/tree/master/examples/quickstart/impala[Impala Quickstart Guide]
 
 == Destroying the Cluster
 
diff --git a/examples/quickstart/impala/README.adoc b/examples/quickstart/impala/README.adoc
new file mode 100644
index 0000000..c0f7644
--- /dev/null
+++ b/examples/quickstart/impala/README.adoc
@@ -0,0 +1,221 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+= Apache Impala Quickstart
+
+Below is a brief example using Apache Impala to insert, update, delete, and query data in Apache Kudu.
+
+== Start the Kudu Quickstart Environment
+
+See the Apache Kudu
+link:https://kudu.apache.org/docs/quickstart.html[quickstart documentation]
+to setup and run the Kudu quickstart environment.
+
+== Run Apache Impala
+
+Use the following command to run the latest Apache Impala docker image:
+
+NOTE: This docker image is a single node Kudu only image published for use in this quickstart only.
+The image is running a Hive metastore backed by a Derby database along with an Impala statestore daemon,
+catalog daemon, and executor daemon. The docker run command below will expose the required RPC and HTTP ports.
+
+[source,bash]
+----
+docker run -d --name kudu-impala --network="docker_default" \
+  -p 21000:21000 -p 21050:21050 -p 25000:25000 -p 25010:25010 -p 25020:25020 \
+  --memory=4096m apache/kudu:impala-latest impala
+----
+
+You can view the running Impala instance at link:http://localhost:25000[localhost:25000]
+once it is up and running. It may take a few seconds to start.
+
+NOTE: `--network="docker_default"` is specified to connect the container the
+same network as the quickstart cluster.
+
+NOTE: You can remove the `-d` flag to run the container in the foreground.
+
+
+== Run the impala-shell
+
+Use the command below to enter the `impala-shell` in the `kudu-impala` container:
+
+[source,bash]
+----
+docker exec -it kudu-impala impala-shell
+----
+
+NOTE: If the `impala-shell` says "Could not connect", wait a few more seconds to give
+Impala time to start and then enter `connect;` in the shell to try again.
+
+== Create a Kudu Table
+
+Now that you are in an `impala-shell` that is connected to Impala you can use an
+link:https://impala.apache.org/docs/build/html/topics/impala_ddl.html[Impala DDL statement]
+to create a Kudu table.
+
+[source,bash]
+----
+CREATE TABLE my_first_table
+(
+  id BIGINT,
+  name STRING,
+  PRIMARY KEY(id)
+)
+PARTITION BY HASH PARTITIONS 4
+STORED AS KUDU;
+
+DESCRIBE my_first_table;
+----
+
+== Insert and Modify Data
+
+With `my_first_table` created you can now use
+link:https://impala.apache.org/docs/build/html/topics/impala_dml.html[Impala DML statements]
+to `INSERT`, `UPDATE`, `UPSERT`, and `DELETE` data.
+
+[source,bash]
+----
+-- Insert a row.
+INSERT INTO my_first_table VALUES (99, "sarah");
+SELECT * FROM my_first_table;
+
+-- Insert multiple rows.
+INSERT INTO my_first_table VALUES (1, "john"), (2, "jane"), (3, "jim");
+SELECT * FROM my_first_table;
+
+-- Update a row.
+UPDATE my_first_table SET name="bob" where id = 3;
+SELECT * FROM my_first_table;
+
+-- Use upsert to insert a new row and update another.
+UPSERT INTO my_first_table VALUES (3, "bobby"), (4, "grant");
+SELECT * FROM my_first_table;
+
+-- Delete a row.
+DELETE FROM my_first_table WHERE id = 99;
+SELECT * FROM my_first_table;
+
+-- Delete multiple rows.
+DELETE FROM my_first_table WHERE id < 3;
+SELECT * FROM my_first_table;
+----
+
+== Create an External Table
+
+Sometimes users want to create an Impala table that points to an existing Kudu table.
+This can be achieved by using an
+link:https://impala.apache.org/docs/build/html/topics/impala_tables.html#external_tables[external table] in Impala.
+This will create an Impala table entry that points to the existing underlying Kudu table.
+
+[source,bash]
+----
+CREATE EXTERNAL TABLE my_second_table
+STORED AS KUDU
+TBLPROPERTIES('kudu.table_name' = 'impala::default.my_first_table');
+
+DESCRIBE my_second_table;
+
+DESCRIBE EXTENDED my_second_table;
+----
+
+== Drop the Tables
+
+You can drop the tables with a simple Impala DROP TABLE statement.
+When dropping the external table the underlying Kudu table will still exist.
+But when dropping the managed table the underling Kudu data will also be dropped.
+
+[source,bash]
+----
+DROP TABLE my_second_table;
+
+DESCRIBE my_first_table;
+SELECT * FROM my_first_table;
+
+DROP TABLE my_first_table;
+----
+
+== Exit the impala-shell
+
+Use the statement below to get exit the `impala-shell` in the `kudu-impala` container:
+
+[source,bash]
+----
+exit;
+----
+
+== Shutdown Impala
+
+Once you are done with the Impala container you can shutdown in a couple of ways.
+If you ran Impala without the `-d` flag, you can use `ctrl + c` to stop the  container.
+
+If you ran Impala with the `-d` flag, you can use the following to
+gracefully shutdown the container:
+
+[source,bash]
+----
+docker stop kudu-impala
+----
+
+To permanently remove the container run the following:
+
+[source,bash]
+----
+docker rm kudu-impala
+----
+
+== Next steps
+
+The above example illustrates the basics of interacting with Kudu tables in Apache Impala.
+Next explore the other quickstart guides to learn how to ingest the data using other tools.
+
+For example, the link:https://github.com/apache/kudu/tree/master/examples/quickstart/spark[Spark quickstart guide]
+and link:https://github.com/apache/kudu/tree/master/examples/quickstart/nifi[NiFi quickstart guide]
+will walk you through how to ingest and process data in Kudu. You can follow those quickstart guides
+and query the data ingested using the steps described in this quickstart.
+
+If you have already run through the Spark quickstart the following is an
+example of the code to allow you to query the `sfmta_kudu` table:
+
+[source,bash]
+----
+CREATE EXTERNAL TABLE sfmta_kudu
+STORED AS KUDU
+TBLPROPERTIES('kudu.table_name' = 'sfmta_kudu');
+
+SELECT * FROM sfmta_kudu
+ORDER BY speed
+LIMIT 5;
+----
+
+If you have already run through the NiFi quickstart the following is an
+example of the code to allow you to query the `random_user` table:
+
+[source,bash]
+----
+CREATE EXTERNAL TABLE random_user
+STORED AS KUDU
+TBLPROPERTIES('kudu.table_name' = 'random_user');
+
+SELECT count(*) FROM random_user;
+
+SELECT * FROM random_user LIMIT 5;
+----
+
+== Help
+
+If have questions, issues, or feedback on this quickstart guide, please reach out to the
+link:https://kudu.apache.org/community.html[Apache Kudu community].
diff --git a/examples/quickstart/nifi/README.adoc b/examples/quickstart/nifi/README.adoc
index 0f7c79d..de91033 100644
--- a/examples/quickstart/nifi/README.adoc
+++ b/examples/quickstart/nifi/README.adoc
@@ -31,7 +31,7 @@ Use the following command to run the latest Apache NiFi Docker image:
 
 [source,bash]
 ----
-docker run --name kudu-nifi --network="docker_default" -p 8080:8080 apache/nifi:latest
+docker run -d --name kudu-nifi --network="docker_default" -p 8080:8080 apache/nifi:latest
 ----
 
 You can view the running NiFi instance at link:http://localhost:8080/nifi[localhost:8080/nifi].
@@ -39,7 +39,7 @@ You can view the running NiFi instance at link:http://localhost:8080/nifi[localh
 NOTE: `--network="docker_default"` is specified to connect the container the
 same network as the quickstart cluster.
 
-NOTE: You can include the `-d` flag to run the container in the background.
+NOTE: You can remove the `-d` flag to run the container in the foreground.
 
 == Create the Kudu table
 
@@ -131,7 +131,7 @@ link:https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#starting-a-comp
 == Shutdown NiFi
 
 Once you are done with the NiFi container you can shutdown in a couple of ways.
-If you ran NiFi without the `-d` flag, you can use `ctrl + c` to stop the  container.
+If you ran NiFi without the `-d` flag, you can use `ctrl + c` to stop the container.
 
 If you ran NiFi with the `-d` flag, you can use the following to
 gracefully shutdown the container:
@@ -180,3 +180,8 @@ random_user.createOrReplaceTempView("random_user")
 spark.sql("SELECT count(*) FROM random_user").show()
 spark.sql("SELECT * FROM random_user LIMIT 5").show()
 ----
+
+== Help
+
+If have questions, issues, or feedback on this quickstart guide, please reach out to the
+link:https://kudu.apache.org/community.html[Apache Kudu community].
diff --git a/examples/quickstart/spark/README.adoc b/examples/quickstart/spark/README.adoc
index e833838..3ff435c 100644
--- a/examples/quickstart/spark/README.adoc
+++ b/examples/quickstart/spark/README.adoc
@@ -223,3 +223,8 @@ mutate data in a streaming fashion.
 As an exercise to learn the Kudu programmatic APIs, try implementing a program
 that uses the link:http://www.nextbus.com/xmlFeedDocs/NextBusXMLFeed.pdf[SFMTA XML data feed]
 to ingest this same dataset in real time into the Kudu table.
+
+== Help
+
+If have questions, issues, or feedback on this quickstart guide, please reach out to the
+link:https://kudu.apache.org/community.html[Apache Kudu community].
diff --git a/examples/quickstart/ycsb/README.adoc b/examples/quickstart/ycsb/README.adoc
index 3618c6c..790d711 100644
--- a/examples/quickstart/ycsb/README.adoc
+++ b/examples/quickstart/ycsb/README.adoc
@@ -96,3 +96,9 @@ pushd YCSB
 mvn -Psource-run -pl site.ycsb:kudu-binding -am clean package -DskipTests -Dkudu.version=1.12.0-SNAPSHOT
 popd
 ----
+
+
+== Help
+
+If have questions, issues, or feedback on this quickstart guide, please reach out to the
+link:https://kudu.apache.org/community.html[Apache Kudu community].